Two Guys Arguing

7 Rules for Writing Clojure Programs

Posted in clojure by youngnh on 07.26.10

Over the past 5 months, I’ve had the incredible opportunity at Revelytix to write Clojure every day and get paid for it. 5 months is an incredibly short time to pretend to have learned anything, but I can feel the beginnings of a style emerge in my programming and while writing a small program some ideas congealed into actual words that I thought I’d capture here.

Update: Ugh. I really messed up. As it has been noted in the comments below, on Hacker News and even Twitter, my final solution is much (much) slower thanks to it’s not one, but two sorts. In the end, the whole thing is doubly redundant as clojure.contrib.seq-utils implemented a function ‘frequencies’ which will be in 1.2′s clojure.core. It uses ‘reduce’ and you should too. Don’t fall in love with your code kids, it can’t love you back.

#1 – Your brain will think in steps. It can’t help it
#2 – loop/recur is not a code smell, but be suspicious of it
#3 – Transformations of lists, not recursion is key
#4 – Don’t be lazy
#5 – One liners are hard to read
#6 – Use -> and ->> to make one liners easier to read
#7 – Don’t use -> or ->>

The program that brought these ideas to life was a small utility I needed to read a file and print out the set of characters contained within along with the number of occurrences of each character. Pretty simple.

Rule #1 – Your brain will think in steps.

I consider myself to be a fairly competent functional programmer. Of course, my first thoughts at solving this character counting program ran along these lines: Pull each character out of a string individually and increment a count for each one. I even put together an eight line (8!) function to do exactly that:

(defn char-counter [str]
  (loop [counts {}
         s str]
    (if-not (empty? s)
      (let [c (first s)]
        (recur (assoc counts c (inc (get counts c 0)))
               (rest s)))
      counts)))

Most programmers would stop here. I wish I could.

Rule #2 – Be suspicious of loop/recur

One-liners are the epitome of brevity, and brevity, the epitome of wit. Technically speaking, all Clojure s-expressions are one-liners since they are merely nested lists. But this is similar to saying that all C programs could be one-liners thanks to that handly ; character. Clojure has a natural kind of indentation and line-breaking. let and loop and things with binding forms all naturally form multi-line statements in Clojure. Not all programs are expressible as one-liners, but beyond a doubt, no one-liners use loop/recur.

Clojure, and more generally, functional programming, gain their power from their expressiveness. In drab, unimaginative Java, or even more awesome programming languages like Python and Ruby, the programmer has very little power over what concepts they can express directly in the language. At the very least, these concepts must be formed by some combination of syntax and language features. Clojure concepts, almost without fail, exist as individual symbols. When I decided that I wanted to make my eight-liner a one-liner, I was really deciding to express my program with a minimal set of complementary and orthogonal concepts.

Rule #3 – Transformation not recursion

Functional programming is about functions, right? So what could be more functional than a function calling itself? Just read the Wikipedia article on fixed-point combinators, the theoretical functions that allow functions to call themselves, it’s chock full of lambdas. Real functional programmers use recursion.

Recursion has its place in functional programming, but it’s not pervasive. Recusive-ness (recursivity?) is a property of the problem being solved. Repeating, nested computations on similar data structures are usually good candidates for recursion. Our problem is not. Instead, I reformulated how I might solve this problem. Here’s briefly the data structures that flashed through my mind in quick succession, forming a plan of possible action:

"abcdaabccc"

(a a a b b c c c c d)

((a a a) (b b) (c c c c) (d))

((a 3) (b 2) (c 4) (d 1))

((c 4) (a 3) (b 2) (d 1))

All lists. And each transformation vaguely indicates what Clojure function I should apply.

"abcdaabccc"
sort
(a a a b b c c c c d)
split it up
((a a a) (b b) (c c c c) (d))
count
((a 3) (b 2) (c 4) (d 1))
sort
((c 4) (a 3) (b 2) (d 1))

Rule #4 – Don’t be Lazy!

I couldn’t just write (sort (split-it-up (count (sort "abcdaabccc")))) not only because that would have evaluated my desired transformations in reverse order, but also because that’s a broad-strokes solution, the details would get in the way of that actually working. Here’s what I came up with instead:

(defn char-counter [str]
  (sort-by second >
           (map #(list (first %) (count %))
                (split-it-up #(not (= %1 %2))
                             (sort str)))))

I didn’t include my definition of split-it-up here, since the way I wrote it was a microcosm of bad style that suffers recursively from all the problems this blog post is trying to address. Consider the above code snippet what not to do.

My second attempt suffered from laziness. Not the kind that you can solve with a doall but the kind that encourages you to use a whole lot of helper defns and lambdas. Rule #4 is not about evaluation. It is recognizing that the distillation of a concept can be no more pure in Clojure than when it is expressed as a single symbol. Just because you can think of two or three intellectually easy ways to write something doesn’t mean you’ve done it justice.

Above, a helper function split-it-up that weighed in at as many lines of code as my original solution to this problem. Also, the (map #(list (first %) (count %) was an easy way to get the two values I needed from every sublist. It’s readable and it works, but it’s lazy.

Ask yourself why there isn’t a concept for what you’re trying to write already. Why isn’t it already a built-in function? Always push to use a single symbol, a function you didn’t write, where you might otherwise invoke one you did or use the ubiquitous map with an anonymous function.

Rule #5 – One liners are hard to read

The calls to sort were already well known concepts for me and work well here. The split-it-up and atrocious
(map #(list (first %) (count %)) calls needed to go.

There are some guidelines to finding functions that you didn’t know existed. Brute-force search is a pretty good start. clojure.core isn’t all that large. Starting from a similar function is a good strategy for breaking up your search-space. In my case, I had heard of partition and knew it’s operation to be very similar to what I wrote split-it-up to do. Clojure 1.1 doesn’t have it, but the to-be-released-any-day-now Clojure 1.2 has partition-by which turned out to be exactly what I needed.

I was totally lost as to how to further distill the second expression. I knew I wanted something that used first and count and little else. The call to list and placing the arguments for evaluation were plumbing, something that wasn’t directly related to my problem. This seemed to me to be a new concept I had no tool for. Provided a list of functions, I wanted to get a new function that returned the result of collecting their application.

I considered a couple of approaches. do is a form similar in spirit. Besides the fact that it’s not a pure function, it takes some computations and performs each one. I could have written my own combinator and solved the plumbing case in general with something like:

(defn frobulate [fs]
  (fn [& vals]
    (map #(apply % vals) fs))

But for some reason it got wedged in my head that perhaps iterate might hold the key to a terse expression of this idea. It didn’t. Two functions down from that in the docs however, I found juxt. Exactly what I needed.

A one-liner emerged:

(sort-by second > (map (juxt first count) (partition-by identity (sort "abcdaabccc"))))

Rule #6 – -> and ->> are your friend

It still reads backward. The revelation that I should be using ->> came to me way back at Rule #3. I was applying multiple transformations to a single sequence, which the ->> macro exists to make more readable.

(->> str
     sort
     (partition-by identity)
     (map (juxt first count))
     (sort-by second >))

Rule #7 – Don’t use -> and ->>

If you’ll notice, we’ve now transformed our crappy, procedural series of steps into a beautifully constructed and elegantly distilled…series of steps. Don’t worry so much about it. This is, to my eye, another one of the great circle-of-life symmetries (ourobous?) of computation. Code is data, data is code, procedural steps beget functional evaluation beget a series of procedural steps. Just to be clear though, I still think you should use ->>. In our case, I like it’s readability better than our one-liner. But if you sat down next to me, explained your fantastic transformative functional ideas and then the next 4 characters you typed were the threading macro, I’d take the keyboard away from you and bludgeon you to death with it. Because of Rule #1, Rule #5 becomes a dangerous weapon. Most of the time, a let would suffice to make our first, wild attempts readable and flexible. In our case, we could:

(let [a (sort str)
      b (partition-by identity a)
      c (map (juxt first count) a)]
  (sort-by second > c))

but it’s pathological.

Parting thought. juxt was a bit of a revelation for me. 5 months ago, I never would have even looked for an operator to solve what I could write myself to suit my needs just as well. The cold, stark expressiveness of
(juxt first count) makes me wonder if now that you hold in your hands the 4 steps of character counting: sort (partition-by identity) (map (juxt first count)) and (sort-by second >) you could combine them with one other operator, (instinctively I want to say reduce?) to write a one liner that doesn’t do so much as be.

Follow

Get every new post delivered to your Inbox.