Ep 020: Data Dessert
► Play EpisodeChristoph and Nate discuss the flavor of pure data.
- "The reduction of the good stuff."
- "We
filter
the points andreduce
the good ones." - Concept 1: To use the power of Clojure core, you give it functions as the
"vocabulary" to describe your data.
- "predicate" function: produce truth values about your data
- "view" or "extractor" function: returns a subset or calculated value from your data
- "mapper" function: transforms your data into different data
- "reduction" (or "reducer") function: combines your data together
- Concept 2: Don't ignore the linguistic aspect of how you name your functions.
- Reading the code can describe what it is doing.
- Good naming is for humans. Clojure doesn't care.
- Concept 3: Transform the data source into a big "bag" data that is true
to structure and information of the source.
- Source data describe the source information well and is not concerned with the processing aspects.
- Transform into data that is useful for processing.
- Concept 4: Using
loop
+recur
for data transform is a code smell.- Not composable: encourages shoving everything together in one place.
- "End up with a ball of mud instead of a bag of data you can sift through."
- "You know what mud sticks to really well? More mud! It's very cohesive! And what couldn't be better than cohesive programs!"
- Concept 5: Use
loop
+recur
for recursion or blocking operations (likecore.async
)- Data shows up asynchronously
- Useful when logic is more naturally expressed as recursion than
filter
+map
+reduce
.
- Concept 6: Duality: stepwise vs aggregate
- Stepwise problem: advance a game state, apply async event, stream processing, etc.
- Stepwise:
reduce
,loop
+recur
- Aggregate problem: selecting the right data and combining it together.
- Aggregate:
filter
+map
+reduce
- Aggregate problems tend to be eager--they want to process the whole data set.
- Concept 7: Use your bag of granular data to work toward a bag of higher-level data.
- We went from lines → entries → days → weeks
- "Each level of data allows you to answer different questions."
- Concept 8: Duality: higher-level data vs granular data with lots of dimensions
- Eg. having a single "day" record vs a bunch of "entry" records that all share the same "date" field.
- The "right" choice depends on your usage pattern.
- Dimensional data tends to stay flat, but high-level data tends toward nesting.
- A high-level record is a pre-calculated answer you can use over and over quickly.
- Highly-dimensional, granular record allows you to "ask" questions spanning arbitrary dimensions. Eg. "What weeknights in January did I work past midnight?"
- Concept 9: Keep it pure. Avoid side effects as much as possible.
- Pure functions are the bedrock of functional programming.
- REPL and unit test friendly.
- "You can use data without hidden attachments. You remember side effects when you're writing them, but you don't remember them three months later."
- Concept 10: Keep I/O at the "edges" with pure functions in the "middle".
- "I/O should be performed by functions that you didn't write."
- Use pure functions to format your data so you only have to hand it off to
the I/O function. Eg. Create a list of "line" strings to emit with
(run! println lines)
. - You can describe your I/O operations in data and make a "boring" function that just follows them. This allows you to unit test the complicated logic that determines the operations.
- Separates out I/O specific problems from business logic problem: eg. retries, I/O exceptions, etc.
Related episodes:
- 002: Tic-Tac-Toe, State in a Row
- 012: Fiddle with the REPL
- 018: Did I Work Late on Tuesday?
- 019: Dazed by Weak Weeks
Clojure in this episode:
filter
,map
,reduce
loop
,recur
group-by
run!
println