Ep 035: Lifted Learnings

June 28, 2019

► Play Episode

Christoph and Nate lift concepts from the raw log-parsing series.

Reflecting on the lessons learned in the log series.
(01:15) Concept 1: We found Clojure to be useful for devops.
- Everything is a web application these days,
- "The only UIs in Devops are dashboards."
- For most of the series, our UI was our connected editor.
- We grabbed a chunk of the log file and were fiddling with the data in short order.
- We talk about connected editors in our REPL series, starting with Episode 12.
- Being able to iteratively work on the log parsing functions in our editor was key to exploring the data in the log files.
(04:04) Concept 2: Taking a lazy approach is essential when working with a large data set.
- Lazily going through a sequence is reminiscent of database cursors. You are at some point in a stream of data.
- We ran into some initial downsides.
- When using with-open, fully lazy processing results in an I/O error, because the file has been closed already.
- Shouldn't be too eager too early, because then the entire dataset will reside in memory.
- Two kinds of functions: lazy and eager.
  - Lazy functions only take from a sequence as they need more values.
  - Eager functions consume the whole sequence before returning.
- Ensure that only the last function in the processing chain is eager.
- "It only takes one eager to get everybody unlazy."
(08:38) Concept 3: Clojure helps you make your own lazy sequences using lazy-seq.
- Clojure has a deep library of functions for making and processing lazy sequences.
- We were able to make our own lazy sequences that could then be used with those functions.
- Wrap the body in lazy-seq and return either nil (to indicate the end) or a sequence created by calling cons on a real value and a recursive call to itself.
(12:41) Concept 4: We work with information at different levels, and that forms an information hierarchy.
- The data goes from bits to characters to lines, and then we get involved.
- We move from lines on up to more meaningful entities. Parsed lines are maps that have richer information, and then errors are richer still.
- Our parsers take a sequence and emit a new sequence that is at a higher level of information.
- We first explored this concept in the Time series.
- The transformations from one level to the next are all pure.
(14:53) Concept 5: Sometimes you have to go down before you can go up again another way.
- We pre-abstracted a little bit, and only accepted lines that had all of the data we were looking for (time, log level, etc.).
- Exceptions broke that abstraction, so we reworked our "parsed line" map to make the missing keys optional.
(15:54) Concept 6: Maps are flexible bags of dimensions. They are a set of attributes rather than a series of rigid slots that must be filled.
- Functions only need to look at the parts of the map that they need.
- Every time we amplify the data, we add a new set of dimensions.
- Thanks to namespacing, all of these dimensions coexist peacefully.
- Multiple levels of dimensions give you more to filter/map/reduce on.
- Just because you distill, doesn't mean you want to lose essence.
(21:09) Concept 7: Operating within a level of information is a different concern than lifting up to a higher level of information.
- Within a level, functions aid in filtering and aggregating.
- Between levels, functions recognize patterns and groupings to produce higher levels of information.
- Make the purpose of the function clear in how you name it.
- Separate functions that "lift" the data from functions that operate at the same level of information.
- When exploring data, you don't know where it will lead, so start by moving the data up a level in small steps.

Related episodes:

Clojure in this episode:

lazy-seq, cons
with-open

← Ep 034: Break the Mold Ep 036: Why Do You Recommend Clojure? → Top