Two Guys Arguing

7 Rules for Writing Clojure Programs

Posted in clojure by youngnh on 07.26.10

Over the past 5 months, I’ve had the incredible opportunity at Revelytix to write Clojure every day and get paid for it. 5 months is an incredibly short time to pretend to have learned anything, but I can feel the beginnings of a style emerge in my programming and while writing a small program some ideas congealed into actual words that I thought I’d capture here.

Update: Ugh. I really messed up. As it has been noted in the comments below, on Hacker News and even Twitter, my final solution is much (much) slower thanks to it’s not one, but two sorts. In the end, the whole thing is doubly redundant as clojure.contrib.seq-utils implemented a function ‘frequencies’ which will be in 1.2’s clojure.core. It uses ‘reduce’ and you should too. Don’t fall in love with your code kids, it can’t love you back.

#1 – Your brain will think in steps. It can’t help it
#2 – loop/recur is not a code smell, but be suspicious of it
#3 – Transformations of lists, not recursion is key
#4 – Don’t be lazy
#5 – One liners are hard to read
#6 – Use -> and ->> to make one liners easier to read
#7 – Don’t use -> or ->>

The program that brought these ideas to life was a small utility I needed to read a file and print out the set of characters contained within along with the number of occurrences of each character. Pretty simple.

Rule #1 – Your brain will think in steps.

I consider myself to be a fairly competent functional programmer. Of course, my first thoughts at solving this character counting program ran along these lines: Pull each character out of a string individually and increment a count for each one. I even put together an eight line (8!) function to do exactly that:

(defn char-counter [str]
  (loop [counts {}
         s str]
    (if-not (empty? s)
      (let [c (first s)]
        (recur (assoc counts c (inc (get counts c 0)))
               (rest s)))
      counts)))

Most programmers would stop here. I wish I could.

Rule #2 – Be suspicious of loop/recur

One-liners are the epitome of brevity, and brevity, the epitome of wit. Technically speaking, all Clojure s-expressions are one-liners since they are merely nested lists. But this is similar to saying that all C programs could be one-liners thanks to that handly ; character. Clojure has a natural kind of indentation and line-breaking. let and loop and things with binding forms all naturally form multi-line statements in Clojure. Not all programs are expressible as one-liners, but beyond a doubt, no one-liners use loop/recur.

Clojure, and more generally, functional programming, gain their power from their expressiveness. In drab, unimaginative Java, or even more awesome programming languages like Python and Ruby, the programmer has very little power over what concepts they can express directly in the language. At the very least, these concepts must be formed by some combination of syntax and language features. Clojure concepts, almost without fail, exist as individual symbols. When I decided that I wanted to make my eight-liner a one-liner, I was really deciding to express my program with a minimal set of complementary and orthogonal concepts.

Rule #3 – Transformation not recursion

Functional programming is about functions, right? So what could be more functional than a function calling itself? Just read the Wikipedia article on fixed-point combinators, the theoretical functions that allow functions to call themselves, it’s chock full of lambdas. Real functional programmers use recursion.

Recursion has its place in functional programming, but it’s not pervasive. Recusive-ness (recursivity?) is a property of the problem being solved. Repeating, nested computations on similar data structures are usually good candidates for recursion. Our problem is not. Instead, I reformulated how I might solve this problem. Here’s briefly the data structures that flashed through my mind in quick succession, forming a plan of possible action:

"abcdaabccc"

(a a a b b c c c c d)

((a a a) (b b) (c c c c) (d))

((a 3) (b 2) (c 4) (d 1))

((c 4) (a 3) (b 2) (d 1))

All lists. And each transformation vaguely indicates what Clojure function I should apply.

"abcdaabccc"
sort
(a a a b b c c c c d)
split it up
((a a a) (b b) (c c c c) (d))
count
((a 3) (b 2) (c 4) (d 1))
sort
((c 4) (a 3) (b 2) (d 1))

Rule #4 – Don’t be Lazy!

I couldn’t just write (sort (split-it-up (count (sort "abcdaabccc")))) not only because that would have evaluated my desired transformations in reverse order, but also because that’s a broad-strokes solution, the details would get in the way of that actually working. Here’s what I came up with instead:

(defn char-counter [str]
  (sort-by second >
           (map #(list (first %) (count %))
                (split-it-up #(not (= %1 %2))
                             (sort str)))))

I didn’t include my definition of split-it-up here, since the way I wrote it was a microcosm of bad style that suffers recursively from all the problems this blog post is trying to address. Consider the above code snippet what not to do.

My second attempt suffered from laziness. Not the kind that you can solve with a doall but the kind that encourages you to use a whole lot of helper defns and lambdas. Rule #4 is not about evaluation. It is recognizing that the distillation of a concept can be no more pure in Clojure than when it is expressed as a single symbol. Just because you can think of two or three intellectually easy ways to write something doesn’t mean you’ve done it justice.

Above, a helper function split-it-up that weighed in at as many lines of code as my original solution to this problem. Also, the (map #(list (first %) (count %) was an easy way to get the two values I needed from every sublist. It’s readable and it works, but it’s lazy.

Ask yourself why there isn’t a concept for what you’re trying to write already. Why isn’t it already a built-in function? Always push to use a single symbol, a function you didn’t write, where you might otherwise invoke one you did or use the ubiquitous map with an anonymous function.

Rule #5 – One liners are hard to read

The calls to sort were already well known concepts for me and work well here. The split-it-up and atrocious
(map #(list (first %) (count %)) calls needed to go.

There are some guidelines to finding functions that you didn’t know existed. Brute-force search is a pretty good start. clojure.core isn’t all that large. Starting from a similar function is a good strategy for breaking up your search-space. In my case, I had heard of partition and knew it’s operation to be very similar to what I wrote split-it-up to do. Clojure 1.1 doesn’t have it, but the to-be-released-any-day-now Clojure 1.2 has partition-by which turned out to be exactly what I needed.

I was totally lost as to how to further distill the second expression. I knew I wanted something that used first and count and little else. The call to list and placing the arguments for evaluation were plumbing, something that wasn’t directly related to my problem. This seemed to me to be a new concept I had no tool for. Provided a list of functions, I wanted to get a new function that returned the result of collecting their application.

I considered a couple of approaches. do is a form similar in spirit. Besides the fact that it’s not a pure function, it takes some computations and performs each one. I could have written my own combinator and solved the plumbing case in general with something like:

(defn frobulate [fs]
  (fn [& vals]
    (map #(apply % vals) fs))

But for some reason it got wedged in my head that perhaps iterate might hold the key to a terse expression of this idea. It didn’t. Two functions down from that in the docs however, I found juxt. Exactly what I needed.

A one-liner emerged:

(sort-by second > (map (juxt first count) (partition-by identity (sort "abcdaabccc"))))

Rule #6 – -> and ->> are your friend

It still reads backward. The revelation that I should be using ->> came to me way back at Rule #3. I was applying multiple transformations to a single sequence, which the ->> macro exists to make more readable.

(->> str
     sort
     (partition-by identity)
     (map (juxt first count))
     (sort-by second >))

Rule #7 – Don’t use -> and ->>

If you’ll notice, we’ve now transformed our crappy, procedural series of steps into a beautifully constructed and elegantly distilled…series of steps. Don’t worry so much about it. This is, to my eye, another one of the great circle-of-life symmetries (ourobous?) of computation. Code is data, data is code, procedural steps beget functional evaluation beget a series of procedural steps. Just to be clear though, I still think you should use ->>. In our case, I like it’s readability better than our one-liner. But if you sat down next to me, explained your fantastic transformative functional ideas and then the next 4 characters you typed were the threading macro, I’d take the keyboard away from you and bludgeon you to death with it. Because of Rule #1, Rule #5 becomes a dangerous weapon. Most of the time, a let would suffice to make our first, wild attempts readable and flexible. In our case, we could:

(let [a (sort str)
      b (partition-by identity a)
      c (map (juxt first count) a)]
  (sort-by second > c))

but it’s pathological.

Parting thought. juxt was a bit of a revelation for me. 5 months ago, I never would have even looked for an operator to solve what I could write myself to suit my needs just as well. The cold, stark expressiveness of
(juxt first count) makes me wonder if now that you hold in your hands the 4 steps of character counting: sort (partition-by identity) (map (juxt first count)) and (sort-by second >) you could combine them with one other operator, (instinctively I want to say reduce?) to write a one liner that doesn’t do so much as be.

24 Responses

Subscribe to comments with RSS.

  1. Duncan said, on 07.26.10 at 3:24 pm

    While your final solution is a shorter (and prettier), isn’t it also a lot slower?

    Instead of one linear scan of ‘str’, now, you now do 3 scans (SORT, PARTITION-BY, MAP)

  2. Alex Miller said, on 07.26.10 at 3:30 pm

    Nice post Nate! At the very end, are you looking for comp to compose those functions?

    Because this is the internet, I’ll assume for a second though that you’re totally wrong and throw stones at this answer. In particular, needing to sort all of the data very much reduces the general applicability of this function due to it not being amenable to lazy (perhaps even infinite) streams (for computing a running character historgram on a stream) and also due to its performance (and memory!) characteristics. The initial recursive solution is actually pretty good in that respect however. I probably would have attempted some sort of reduce-like solution as the second step after the loop/recur.

    One down side of a recursive / reduce-like solution though is that while they are good for streaming, they are bad for parallelism. I know of no better source for this train of thought than Guy Steele’s talk last year on organizing code for parallelism. In that solution, you want to think of the full character set not as a stream but as the leaves of a tree to be computed in parallel via divide and conquer (mapreduce). Would be interesting to see such a solution for comparison. [Of course, Guy will be doing an updated version of this talk as a keynote at Strange Loop this year. :)]

  3. Alexey Romanov said, on 07.26.10 at 4:00 pm

    Well, after reading the problem statement I immediately thought: “This is a simple fold.” In Scala (since I don’t know Clojure syntax):

    charStream.foldLeft(Map.empty[Char, Int].withDefaultValue(0)) { (map, c) => map.updated(c, map(c) + 1) }

  4. Dave said, on 07.26.10 at 4:13 pm

    In Clojure, folds are accomplished with ‘reduce’. The core library contains a specialized reduce called ‘merge-with’ that is especially useful in this case:

    clojure.core/merge-with
    ([f & maps])
    Returns a map that consists of the rest of the maps conj-ed onto
    the first. If a key occurs in more than one map, the mapping(s)
    from the latter (left-to-right) will be combined with the mapping in
    the result by calling (f val-in-result val-in-latter).

    And using it to solve this problem:

    (apply merge-with + (map (fn [c] {c 1}) “abcdaabccc”))

  5. Justin Kramer said, on 07.26.10 at 4:47 pm

    Check out ‘frequencies’ (clojure.core in 1.2, clojure.contrib.seq-utils in 1.1):

    (sort-by val > (frequencies “abcdaabccc”))

  6. Aria said, on 07.26.10 at 5:32 pm

    Or you could just do (frequencies str).

  7. Peter Tillemans said, on 07.26.10 at 6:48 pm

    To get the same output I need

    (sort (fn [a b] (> (val a) (val b))) (frequencies “abcdaabccc”))

    But it is a great read to see the meandrings of coming to a solution.

  8. ad said, on 07.26.10 at 8:43 pm

    This is probably the most poorly written and disorganized article I’ve read in awhile.

  9. Alex Miller said, on 07.26.10 at 9:13 pm

    @ad: I absolutely disagree. I think it’s very well written. Just because it may not come up with the ultimate solution doesn’t mean that there isn’t value in putting your work into the world.

    I think many Clojure (or FP) programmers can gain value from seeing how ANY code evolves and even more from all of the comments and additional examples others add. How many people were introduced to loop-recur, partition-by, sort/sort-by, juxt, ->/->> by the article or to reduce, merge-with, frequencies, etc in the comments? Or with the idea of transformation vs recursion? Or with the idea of using the threading macros to make code read “forwards” instead of “backwards”. Or with the whole meta idea of finding the most natural way to write a functional process?

    It takes great courage to write an article like this and expose yourself to scrutiny and is a fantastic way to learn.

  10. Ramakrishnan said, on 07.27.10 at 12:50 am

    As several people said, frequencies can be used, or you can implement one quite easily:

    (reduce #(assoc %1 %2 (inc (get %1 %2 0))) {} “aabcdcdcd”)
    => {\d 3, \c 3, \b 1, \a 2}

    I agree with Alex Miller. Appreciate you posting this.

  11. Gary Overgard said, on 07.27.10 at 6:08 am

    As a beginner struggling with how to think in Clojure I really enjoyed this article -and the comments that followed showing other solutions. Having some basic rules of thumb for what is a good approach, and what is not, is incredibly handy. I am always on the lookout for Clojure articles that show an evolution of a problem. –thanks for the thoughts

  12. Dave F. said, on 07.27.10 at 1:47 pm

    As some folks on Hacker News noticed, ->> is there to reduce errors. Please note in the #7 example that you erroneously use “a” instead of “b” in your computation of “c”.

    The threading ops make this kind of error impossible, while maintaining code clarity. Not using them is a big mistake. They exist to reduce errors, reduce excess typing, and signal to code readers that a transformation is taking place. Not using them due to dogma is a mistake.

  13. Top Posts — WordPress.com said, on 07.27.10 at 7:08 pm

    […] 7 Rules for Writing Clojure Programs Over the past 5 months, I’ve had the incredible opportunity at Revelytix to write Clojure every day and get paid […] […]

  14. Horst said, on 12.29.10 at 5:34 am

    youngnh,

    you should properly format the code under rule #1. The “(let” line needs to have the same indentation as the last line that starts with “counts”, otherwise it’s hard to recognize as the 2 branches of the if-not.

    • youngnh said, on 12.30.10 at 7:45 am

      good catch, I fixed the spacing for that code snippet

  15. flashuac said, on 05.02.11 at 5:07 pm

    Very interesting post! :)

  16. […] that will make more sense closer near the end of this post,  everything should be immutable – here is a good read about some best practices with […]

  17. thehotelambush said, on 08.17.12 at 8:44 pm

    Just mathematically, this problem is easy to express recursively:

    Freqs(())[c] = 0

    Freqs(x.ys)[c] = Freqs(ys)[c] + (1 if x = c else 0).

    Don’t underestimate the power of recursion.

    Even when you’re not using it explicitly, you are probably using some built-in function which does (e.g. reduce).

    • youngnh said, on 08.28.12 at 9:58 am

      I’m a big fan of this notation. It is almost immediately apparent what’s happening because of how little actually happens. It is not hard to keep the whole thing in your head. It is a description of the solution itself, not a long recipe of how to get there. Base case, induction case. No muss, no fuss.

      That said, its a bit of a trip to translate a definition like that into Clojure. The language provides (just about) everything you need to do so efficiently, but, well, see for yourself if you like where this ends up.

      Here’s the closest expression of the initial definition as I could make it as a simple recursive fn

      (defn Freqs [[x & ys :as coll] c]
        (cond
         (empty? coll) 0
         (= x c) (inc (Freqs ys c))
         :else (Freqs ys c)))
      

      That definition will compute the entirety of Freqs every time you ask for c. It can avoid the repeated work by memoizing an anonymous function.

      (defn fix [f]
        ((fn [makef]
           (fn [& args]
             (apply f (makef makef) args)))
         (fn [makef]
           (fn [& args]
             (apply f (makef makef) args)))))
      
      (def Freqs
        (memoize (fix (fn [recurse [x & ys :as coll] c]
                        (cond
                         (empty? coll) 0
                         (= x c) (inc (recurse ys c))
                         :else (recurse ys c))))))
      

      Thankfully, Clojure lets you name anonymous functions so that you don’t have to resort to a fixed-point combinator.

      (def Freqs
        (memoize (fn thisfn [[x & ys :as coll] c]
                   (cond
                    (empty? coll) 0
                    (= x c) (inc (thisfn ys c))
                    :else (thisfn ys c)))))
      

      This solution still consumes stack every time we recurse and will be unable to compute Freqs on very large seqs.

      (Freqs (take 10000 (repeatedly #(rand-int 10))) 7)
      => StackOverflowError
      

      To get around that, we can use Clojure’s `trampoline` to the mix to solve it in this style.

      (defn sequentially [thunk f]
        (if (fn? thunk)
          (fn [] (sequentially (thunk) f))
          (f thunk)))
      
      (def Freqs
        (memoize (fn [coll c]
                   (trampoline
                    (fn thisfn [[x & ys :as coll] c]
                      (cond
                       (empty? coll) 0
                       (= x c) (sequentially (fn [] (thisfn ys c)) inc)
                       :else (fn [] (thisfn ys c))))
                    coll c))))
      

      However, we’re still left computing the whole of Freqs for every unique c that we pass to the function. I believe that’s still true even if we write this solution in Haskell, which will result in code that looks virtually identical to your mathematical notation and doesn’t have to manually perform the memoization and trampolining backflips above But, if you only use the function once then maybe the increased readability is worth the performance hit

      If you’re interested in a solution that clearly communicates the idea behind the solution, use the mathematical notation, but if you want to run this code, take the advice of commenters above and use clojure.core/frequencies.

  18. […] back when I was just starting Clojure, I came across a blog post that said that loop/recur was kind of like code smell, with the implication that it was a good tool […]

  19. luskwater said, on 11.14.13 at 12:09 pm

    Don’t you want the map expression to be
    c (map (juxt first count) b)
    rather than
    c (map (juxt first count) a)
    in your final code?


Leave a comment