Ep 030: Lazy Does It
► Play EpisodeChristoph's eagerness to analyze the big production logs shows him the value of being lazy instead.
- Last time: going through the log file looking for the mysterious 'code 357'.
- "The error message that just made sense to the person who wrote it. At the time written. For a few days"
- Back and forth with the dev team, but our devops sense was tingling.
- Took a sample, fired up a REPL,
- Ended up with a list of tuples:
- First element: regexp to match
- Second element: handler to transform matches into data
- (02:00) It's running slower and slower, the bigger the log file we analyze.
- "This is a small 4-person tech company." "Where we can use new technologies in the same decade that they were created?" "Yes!"
- Problem: No one turned on log rotation! The log file is 7G and our application crashes.
- "I think we should get lazy."
- "Work harder by getting lazier."
- "Haskell was lazy before it was cool!"
- Each line contains all the information we need, so we can process them one at a time.
- (4:30) Eager and lazy is like the difference between push and pull.
- Giving someone a big bag of work to do, or having them grab more work as they finish.
- The thing doing the work needs a little more to work on, so it pulls it in.
- Clojure helps with this. It gives us an abstraction so we don't have to see the I/O happening when we do our processing.
- When your
map
of a lazy sequence needs more data, it gets it on demand. - Clojure core is built to support this lazy style of processing.
- File I/O in Java is lazy in the same way. It reads data into a buffer and when that buffer is used up, more is read.
- Lazy processing is like a bucket brigade. At the head, you pour out your bucket and the person next to you notices the empty bucket and fills it up. Then this is repeated down the line as each bucket demands to be filled.
- (07:55) Let's make our code lazy.
- Current
lines
function slurps in the file and splits on newline. - Idea: Convert it to open the file and return a lazy sequence using
line-seq
. - The return value can be threaded through the rest of our pipeline.
- Each step of our pipeline is lazy, and the start is lazy, so the whole process should be lazy.
- "It's lazy all the way."
- Problem: We run it, and BOOM, we get an I/O error.
- What happened? We were using the
with-open
macro, which closes the file handle after the body is complete. - Since the I/O is delayed till we consume the sequence, when we start the file is already closed.
- "The ability to pull more I/O out of the file has been terminated."
- "Nothing at all, which is a lot less useful than something."
- (12:29) Rather than having a
lines
function, why don't we just bring that code into thesummary
function? - Entire process is wrapped in a
with-open
so that all steps including summary complete before the file is closed. - Takes a filename and returns an incident count by user.
- It does all that we want, but it's too chunky. We're doing too much in the function.
- We usually want to move I/O to the edges and this commingles it with our logic.
- "We just invited I/O to come move into the middle of our living room."
- I/O and summary are in the same function, so to make a new summary, we have to duplicate everything.
- We could split out the guts, extract the general and detailed parsing into a separate function. For reuse.
- This means you are only duplicating the
with-open
andline-seq
for each new summary. - (16:29) How can we stop duplicating the
with-open
? To separate that idiom into just one place. - If you can't return the
line-seq
, is there a way we can hand in the logic we need all at once? - Idea: Make a function that does the
with-open
and takes a function. - "Let's make it higher order."
- We hand in the "work to do" as a function.
- "What should we call it? How about
process
. That's nice and generic." - We turn the problem of abstraction into the problem of writing a function that takes a line sequence and produces a result.
- Any functions that take and produce a sequence, including those that we wrote, can be used to write this function.
- Clojure gives us several ways of composing functions together, we'll use
->>
(the thread-last macro) in this case. - As we improve the vocabulary for working with lines, our ability to express the function gains power and conciseness.
- Design tension: if there is something that needs to be done for every summary, it can be pushed into the
process
function. - The downside to that is that we sign up for that for every summary, and that might not be appropriate for the ones we haven't written yet.
- We opt for making it easier to express and compose the function passed in.
- We can still make a function that takes a filename and returns a summary, but the way we construct that function is through composition of transforms.
- We can pre-bake a few transforms into shorter names if we want to use them repeatedly.
- (23:20) We will still run into the I/O error problem if we're not careful.
- The function that we pass to process needs to have an eager evaluation at the end.
- If all we do is transform with lazy functions, the I/O won't start before the list is returned.
group-by
orfrequencies
will suffice, but if you don't have one of those, reach fordoall
.- "You gotta give it a good swift kick with
doall
." - Style point:
doall
at the beginning or at the end of the thread? We like it at the end. - (26:07) We have everything we need.
- Lazy so we don't pull in the entire file.
- I/O sits in one function.
- We have control over when we're eager.
Message Queue discussion:
- (26:38) Long-time listener Dave sent us a code sample!
- An alternative implementation for
parse-details
that doesn't use macros. - Top level is an
or
macro. - Inside the
or
, each regex is matched in awhen-let
, the body of which uses the matches to construct the detailed data. - If the regex fails to match,
nil
is returned and theor
will move on to the next block. - We tend to think of
or
as only for booleans, but it works well for controlling program flow as well. - The code is very clean and concise. And it only uses Clojure core.
- "Without dipping into macro-land... Not that there's anything wrong with that."
Related episodes:
Clojure in this episode:
slurp
,with-open
,line-seq
->>
or
,when-let
map
,filter
group-by
,frequencies
doall
clojure.string/split-lines
Code sample from this episode:
(ns devops.week-02
(:require
[clojure.java.io :as io]
[devops.week-01 :refer [parse-line parse-details]]
))
; Parsing and summarizing
(defn parse-log
[raw-lines]
(->> raw-lines
(map parse-line)
(filter some?)
(map parse-details)))
(defn code-357-by-user
[lines]
(->> lines
(filter #(= :code-357 (:kind %)))
(map :code-357/user)
(frequencies)))
; Failed Attempt: returning from with-open
(defn lines
[filename]
(with-open [in (io/reader filename)]
(line-seq in)))
(defn count-by-user
[filename]
(->> (lines filename)
(parse-log)
(code-357-by-user)))
; Throws IOException "Stream closed"
#_(count-by-user "sample.log")
; Works, but I/O is coupled with the logic.
(defn count-by-user
[filename]
(with-open [in (io/reader filename)]
(->> (line-seq in)
(parse-log)
(doall)
(code-357-by-user))))
#_(count-by-user "sample.log")
; Separates out I/O. Allows us to compose the processing.
(defn process-log
[filename f]
(with-open [in (io/reader filename)]
(->> (line-seq in)
(f))))
; Look at the first 10 lines that parsed
#_(process-log "sample.log" #(->> % parse-log (take 10) doall))
; Count up all the "code 357" errors by user
(defn count-by-user
[filename]
(process-log filename #(->> % parse-log code-357-by-user)))
#_(count-by-user "sample.log")
Log file sample:
2019-05-14 16:48:55 | process-Poster | INFO | com.donutgram.poster | transaction failed while updating user joe: code 357
2019-05-14 16:48:56 | process-Poster | INFO | com.donutgram.poster | transaction failed while updating user sally: code 357
2019-05-14 16:48:57 | process-Poster | INFO | com.donutgram.poster | transaction failed while updating user joe: code 357