Two Guys Arguing

The Conj

Posted in clojure by youngnh on 10.25.10

Below I’ve interspersed the wide variety of ideas I came home with from the first Clojure Conj with my own thoughts. Notes I took from the conference you’ll find in black text and you can be reasonably assured that someone said them and I wrote them down. I didn’t take notes on every talk, and of the ones I did, I’m sure I missed some details or got important ones wrong, so feel free to correct factual errors in the comments. Connections between talks and ideas I heard echoed throughout the conference I wrote in green. My own commentary and interpretations are in blue. There were a lot of links and books, projects and presentation materials that sounded interesting, I’ve collected them all as part of my clojureconj materials page, and referenced them throughout this writeup.

The article itself is fairly long, here’s the talks I cover:
Fertile Ground: The Roots of Clojure – Michael Fogus
Zippers – Luke VanderHart
(not= DSL macros)
Lisp, Functional Programming, and the State of Flow
Protocol XIII: Clojure Protocols Explained
Finger Trees: Custom Persistent Collections
Making Leiningen Work for You
New Features – Rich Hickey
Hallway Track
Lightning Talks
Keynote – Rich Hickey
Simplicity Ain’t Easy – Stu Halloway


Fertile Ground: The Roots of Clojure – Michael Fogus (@fogus)
Fogus talked about the ideas and languages that influenced Clojure. They were wide-ranging and not all directly drawn from the field of computer science. He gave a particular shout out to Ruby, Javascript, and Python as languages that prepared the world for Clojure. His talk was full of good stuff, so he didn’t dwell on this point, but it really struck a chord with me that Python in particular legitimized dynamically-typed programming. That’s a very broad statement and I’m not qualified to defend it, so I won’t, but I do believe that had things gone a little differently I might never have written a single line of interpreted code.

Questions I didn’t think to ask after Fogus’ talk:
“What will Clojure’s influence be? Will it be new ideas, or just a combination of the best old ideas? What will a language influenced by Clojure look like?”
Bonus followup: “Did Java not introduce enough new ideas and is this why there’s such a vacuum left in it’s wake?”

See also [1] and [2] as extra reading, both referenced in the talk.


Zippers – Luke VanderHart (@levanderhart)

Based on a paper [3] by Huet, Luke took us through Clojure’s zipper data structure. To create a zipper, you must supply 3 methods. For certain data structures like nested lists and vectors, these methods are very easy to implement.
Navigation and editing become a one-liner:

(-> t down right down down right right (edit f) root)

Zippers incur low creation overhead, through a really neat strategy of storing it’s “implementation” in metadata. Since many structures are navigable as zippers, the library adds metadata to the existing data structure via with-meta. This approach works because clojure structures are immutable and you don’t need your own copy of the data that only you control. Additionally, by not storing the functions in an outright data structure, less data is copied from version to version.

There’s no reason that the 3 functions you provide the zipper have to operate on existing structures: the functions could generate data on the fly if you are navigating a regular, predictable structure.


(not= DSL macros) – Christophe Grand (@cgrand)

  • Users don’t actually want a DSL.
    Rich echoed the sentiment that users don’t know what they want in his day 2 rant by saying not to allow users to ask for features, instead have them tell you what their problem is. I would add that even then, most of the time they don’t know what the actual root problem is, so you can’t listen to them about that directly either, you have to figure it out for yourself. Users are pretty good at describing the symptoms their problems are causing, though.
  • Users want the full power of Clojure to manipulate “pieces” of a DSL
  • Values are a sweet spot, design-wise
    • full power of Clojure
    • macro-enough that expression is terse
  • Macros are icing, not first steps
  • Map domain-specific semantics onto datatypes/values
    • Hide unmappable abstractions behind deftypes (in his presentation, Grand used a defrecord, but Rich during hallway discussions and Stu via twitter chimed in that types are abstractions, records are solely field/information holders)
  • You have an abstraction/semantics creator function + an “eval” function to do something with that abstraction

“Each of these slides is gold. If one doesn’t teach you something, you haven’t looked hard enough yet.” – Chris Houser

A few things he mentioned that piqued my interest are under my links section: [5] [6] and [7]

This was one of my favorite presentations, go download his slides [19] right now and forever more build your libraries like he does, Clojure will be better for it.


Lisp, Functional Programming, and the State of Flow – Tom Faulhaber (@tomfaulhaber)

Tom talked about what flow is as Mikhal Csikszentmihalyi describes it, as well as masterfully pronouncing the author’s name right on multiple occasions without missing a beat.

His first example was how natural and useful fill-queue [8] was.

His examples illustrated how Clojure lets you write at whatever level of granularity you’d like, which he had mentioned earlier in the talk was important for flow. I made the connection as he was talking, but I don’t think Tom ever really made that connection explicit nor did he instill any larger realization/connection gained from that insight. The gem of the presentation was an excellently specified problem of providing near realtime updates of commits to a source control repository. His solution was elegant and he took us through it, but the problem was so perfectly described and the proposed solution was so natural, that I feel like I could have written the code myself in a short time.

Having a clear written problem statement that is continuously updated would be incredibly useful to a team.

Later, during the lightning talks, one on Aleph [9] mentioned (and I agree) that Aleph might have been an even better match for solving this problem than the solution Tom showed us.


Protocol XIII: Clojure Protocols Explained – Sean Devlin (@fulldisclojure)

Very cool talk here on ad-hoc type inference via protocols. He implemented a protocol named Dateable and chose a single method to unify/describe all Dateables (don’t underestimate the complexity of this step, it’s important to choose a good method). He then wrote implementations for lots of concrete types. Combined with custom “cast” functions, he showed how you can then "use the right type for the job"

This is very powerful way to work with a bunch of disparate types in a single generic way, without losing the ability to use specific features of any type regardless of the “underlying” type. This really blew my mind. Given a long, you could treat it as a Joda time.

Learned during the Q&A Session:
reify creates an unnamed type (an anonymous type), use extend if you already have a type that you’d like to imbue with the protocol’s abilities.

Someone asked if you can extend one protocol to another. You can’t, but when this question was asked, Rich Hickey jumped in and noted that if you wanted protocol P to extend A, because you can’t write:

(extend P A
  (method [this] ...))

you can extend Object to a protocol A, which will be a default fallthrough case. Given the object, you can test if it satisfies protocol P and if it does, you can then extend the object on the fly to implement protocol A in terms of whatever facilities P offers. This is a technique that I’m sure we’ll see more of when Clojure protocols outnumber Java classes in variety and usefulness.


Finger Trees: Custom Persistent Collections – Chris Houser (@chrishouser)

Affectionately known and referred to throughout the conference as the single-syllable “Chouser”, Chris gave a detailed and remarkably easy to follow talk on what appears to be an incredibly dense and detailed data structure. Go download the slides, they do more justice to his talk than I could. The idea of finger trees came from a paper describing them in Haskell. Chouser eased us into thinking about them by introducing a specific case that makes dequeues very easy. In the general case, a more complicated function called a meter is used under the covers to make common list & tree operations inexpensive.

  • Every node is decorated with a count of the elements contained beneath it
  • Each level of the tree indirects once more than the previous
  • Leaf nodes hold no more than 4 elements
  • The fixed number of nodes (< 4) means that nth can be written as a O(log n) operation (vs. O(n) )
  • Splitting trees is easy, and the ability to quickly add a node to the "tail" makes arbitrary concat/modification cheap (reminiscent of zippers). Since split and concat are both log n operations modification in the middle of a finger tree is O(2 * log n + C), which has O(log n) growth characteristics

Making Leiningen Work for You – Phil Hagelberg (@technomancy)

The /checkout folder is functionality (already in? currently being added to?) lein to add features to one project that your current project will use without doing the maven dance.

Phil mentioned some community-building concepts that lein has benefitted from:

  • LHF – identify low-hanging fruit
  • document internals and maintain a hacking guide (much easier if you start on day 1)
  • accept patches promptly
  • trust people, commit rights should be easy to get

New Features – Rich Hickey (@richhickey)

All new features are in master on github, most were committed within the past few days of his talk, but all had been brewing under that delightful mop of hair for much longer.

BigInteger contagion. Rich talked about Clojure’s primitive support, the language’s new rules for handling primitive types as parameters to functions and return values from functions. To implement primitives as functions to parameters, Rich added a bunch of sub interfaces whose names reflect the types of values they take and return. IFn$ODL takes an Object and a double and returns a Long. He used a one-off Clojure function to generate the Java code of each combination out to 4 parameters of each type. He wants feedback. He encouraged Clojurians to write numeric code, and tell him if the the IFn$LOL methods are enough? Does this bring performance acceptably on par with Java? Rich also said that all Clojure programmers care about this. Because even if you don’t write numeric code, this sort of thing creates buzz at the top of the budget food chain that Clojure is a contender and that it doesn’t have to apologize for it’s performance, which tends to trickle down and green-light web projects or tool one-offs. This work was a strategic as well as a tactical improvement.

Rich is trying to get rid of annotating things as static as a way to make Clojure fast. He’s not sure if he’s there yet, so static is still around, but it sounds like it’s never been an optimization he liked adding and he’s actively trying to kill it (as when it’s dead that will mean Clojure is fast without tricks).

Clojure hasn’t gotten every detail and design decision right…yet. Rich talked about binding and threads. Currently they don’t work seamlessly together, which is a shame because less and less work is being done is a single thread. Lots of features in clojure spawn new threads and to not have those features work well with binding is a sore-spot. Ultimately, Rich noted that on the horizon are resource scopes: collections of threads, processors, files and more that are allocated as a unit of work and then released when the work is complete, decoupled from what thread things are running in or other orthogonal concerns. For now, Rich has made def work differently. (Again, on master. This work is yet unreleased). Dynamic binding has a cost (Rich started a little Socratic back and forth with the audience, prodding us to tell him how dynamic binding was implemented) and most people are paying for that cost without using it. Rich decided to change the way def works to remove that cost. def will no longer make dynamic vars by default, thereby removing the cost for the vast majority of def‘s usage. Dynamic variables will need to be declared as such, thereby maintaining the usefulness in scenarios which it is desired. The use case for dynamic variables is more and more shifting towards simplifying value-access in multi-threaded code.

In this session I learned the term “earmuffs” as descriptions for the stars around variables like *out*. (assoc brain that-info)

Also Rich mentioned, and I quickly wrote down, that to mock functions and globals during testing, you should use alter-var-root, execute the test and then alter the root back, DO NOT use binding.


Hallway Track: 200+ simultaneous presenters
I participated in a lot of fun discussions outside of sessions. Some even had something to do with Clojure or software in general.

If you want to be different, use cake/maven bitbucket and vi. The build tools, source control repos and editors Clojurians use are wide in their variance, but the overwhelmingly chosen toolset seemed to be lein, github and emacs.

I talked to some guys from Forward, a company that describes itself as “post-agile”. Another person in the conversation mentioned that he was post-agile, as well as post-OO. It struck me that both approaches were honest movements created to address real needs and presented solutions that solved real problems. However, there was a lot of disillusionment when those movements applied unilaterally failed to lead all programmers to the promised land. The ideas taken as far as they could failed their most devout practitioners. Some refused to acknowledge this, some saw it and rejected all aspects of the movements on those grounds alone.

So it is that today we are “post” agile and OO. The special irony to my ear is that agile, as I was taught it, was a methodology to eschew all methodologies. XP and TDD and Kanban and Scrum all proscribed certain ways of getting things done, but the underlying core of agile seemed to me to already be post-agile. I don’t think that message has come across loud and clear over specific failures and general disillusionment with the practice over time.


Day 2


Lightning Talks

See [16] and [17]

Infer

  • Born of a mix of research (one developer is a doctorate?/post-doc? at Stanford) and practical (the other founded a startup in machine learning and big data) That’s an awesome combo to be on either end if you can swing it
  • For linear and quadratic programming, the hard stuff that you can’t make run in O(log n)
  • "a lot of people have numerical optimization problems and don't even know it."
  • If you do know it, the Infer guys are interested have made something for you and are interested to hear from you about the problem you are solving
  • Check out [11] and [18]

Clojure Debug Toolkit

  • Effort to use more of the JDI
  • Debugging was mentioned in Bedra’s talk as a real Clojure sore-spot

Keynote – Rich Hickey (@richhickey)

This is not a methodology. It’s a rant.
Solve problems with minimal typing.
Solve it the first time you sit down to solve it (or at least you should have a chance to).

Most of the biggest problems in software are of misconception. This statement says more to me now that I’ve thought it over for a while than when I first heard it. It says that problems in the field of software are not limited to those that a computer can solve, which implies that you solve a lot of problems in software without a computer. This talk focused on a sub-category of those problems. Also, this means that software itself really only solves a single class of problems: computation, which I think is a helpful way of thinking about what you’re really writing

The step before go do it is an important step.

You will get good at what you practice. Almost without trying. You can practice thinking about things, but maybe not in the way would you practice a sport or a hobby.

Say the problem aloud. Write it down if you have to. I tend to think that thinking it is just silently saying it, but it’s not. Writing it down might not even be. There is a different part of your brain that switches on when you know someone else will hear your words

Write down, examine, and think about the characteristics of the problem. What do you know? Describe. Stu Halloway’s closing talk noted that approval and disapproval are rampant in modern language, and previously precise words that used to describe things tend to lose meaning over time as they are used more and more genrally to approve or disapprove of things.

Lots of input. You must feed your brain. Bake your ideas. Your forebrain is tactics, but your unconscious brain is strategy. The forebrain is good at reasoning after connections have been made by the unconscious brain.
Caveman forebrain: “Huh, look at all those Bison down by the water”
Caveman unconscious brain after a good night’s sleep: “We should hunt near the watering hole”
Caveman forebrain: “Yeah, duh. That’s what I said”
Don’t think that your forebrain will be all that good at creating new ideas. the fact that it can decide if a good idea holds together or not leads many to think that their forebrain can create good ideas, they’re two separate abilities

"When the facts change, I change my mind. What do you do sir?" – John Maynard Keynes

Q&A

Someone asked a question about bad ideas and getting too attached to them. Rich said to kill them, throw them away. Stop wasting time on them as soon as you know they’re bad. "they're not your children"

Rich singled out a book by Polya [12] as excellent material for practicing the skill of solving problems.


Simplicity Ain’t Easy – Stu Halloway (@stuarthalloway)

In between pulls on a frosty bottle of Newcastle, Stuart Halloway dropped some knowledge on the meaning of simple.

As a concept, it’s not subjective this is a very strong statement in today’s culture of anyone-can-say-and-argue-for-anything-on-the-internet. It is very tempting to look at the sheer number of ideas, realize how little you know — or could ever know — about any of them, and give up and think that there might truth in all of them. Stu is not just saying that simple is not a subjective topic, but that there exist topics that are not subjective. There is right and wrong in the world. Black and white. It doesn’t help that the colors change as we learn more, but don’t give up and don’t bow to sheer pressure or overwhelming opinion. In other words, when someone is wrong on the internet, they might actually be wrong).
"people are far more anxious to express their approval or disapproval of things than to describe them" – C.S. Lewis [15]
Simple has been watered down through this long human drive to approve or disapprove of things. This strikes me as a cognitive reaction to ambiguity. As a being of finite mental capacity, you have to accept that there exist many things you will know nothing about — could never live long enough to know everything about — and you have to short-circuit to make a decision to act on or trust information to be true somehow. Approval is an excellent way to do so, it’s not very rigorous and it can be transferred through networks of trust)

I had to duck out to catch my flight just as Stu started to apply these insights to Clojure. But I rather like that I didn’t hear the points. At least until the video is put online, I’m going to think of it as an exercise left to me to make my own connections.

Advertisements

clojure refs are functions of their current value

Posted in clojure by youngnh on 09.24.10

I was cleaning up some code that looked like this:

(get-nested (:mappings engine) db-name mapping-name)

Where get-nested was a one-off utility function that returned the value two keys deep in a doubly nested map, like so:

(get-nested {:key1 {:key2 "value"}} :key1 :key2) ;; => "value"

There is a Clojure function that does exactly this, only with slightly different syntax, get-in:

(get-in {:key1 {:key2 "value"}} [:key1 :key2]) ;; => "value"

Imagine my surprise when replacing the original code with

(get-in engine [:mappings db-name mapping-name])

blew up in my face. I checked the definition of get-nested to see if it was doing anything extra I wasn’t aware of:

(defn get-nested [map key1 key2] 
  (if-let [map2 (map key1)] 
    (map2 key2))) 

Nope. Looks pretty standard. It was at this point that my coworker informed me that (:mappings engine) returned a ref holding a map. Now I was really confused. This explained why my get-in form blew up, you can’t get a value from a map if it’s in a ref, you have to deref it first, but the get-nested code had been working in our software in heavy use for weeks.

Turns out, get-nested was very fortuitously written in a style that takes advantage of a little corner of Clojure I hadn’t previously known about. When a ref is invoked as a function, it returns the deref’d value of itself.

It’s not uncommon to write a form like:

({:key1 "value"} :key1)

The map is a function of its keys. Passing it a key returns the corresponding value in the map. This is a form I’m used to seeing. Turns out, this works equally well:

(def narwal (ref {:key1 "value"}))
(narwal :key1)

Since a ref is a function of it’s current value, it gives Clojure whatever object it’s currently holding, and Clojure tries to call invoke on that. It is equivalent to writing:

(@narwal :key1)

But where the first form will always return “value”, refs are updateable and over the course of your program, :key1 could have any value or may not exist at all. The above form is not referentially transparent.

You can see that refs are IFns by:

(supers (class (ref {}))) ;; => #{java.lang.Object java.lang.Runnable clojure.lang.IRef clojure.lang.IDeref clojure.lang.IFn clojure.lang.ARef java.util.concurrent.Callable clojure.lang.IReference clojure.lang.IMeta clojure.lang.AReference java.lang.Comparable}

or dig into the Clojure source code on github and look at the implementation yourself.

Tagged with: , ,

stumbling towards the clojure api

Posted in clojure by youngnh on 08.26.10

Here are two examples of code I’ve written and re-written lately:

My first example came about from dealing with DOM Document elements and Nodes, specifically getting a named attribute from a given Node:

;; bad, throws a NullPointerException if any of the Java methods return nil
(defn get-attribute1 [elt attr]
  (.. elt getAttributes (getNamedItem attr) getValue))

;; this works great
(defn get-attribute2 [elt attr]
  (when-let [attrs (. elt getAttributes)]
    (when-let [item (. attrs (getNamedItem attr))]
      (. item getValue))))

;; because lord knows I love monads
(require '[clojure.contrib.monads :as m])
(defn get-attribute3 [elt attr]
  (m/domonad m/maybe-m
    [attrs (. elt getAttributes)
     item (. attrs (getNamedItem attr))]
    (. item getValue)))

;; a slight improvement
(defn get-attribute4 [elt attr]
  ((m/with-monad m/maybe-m
     (m/m-chain [(memfn getAttributes)
		 #(. % (getNamedItem attr)) ;; take note that (memfn getNamedItem attr) will _not_ work here
		 (memfn getValue)]))
   elt))

;; recommended (look at the source of .?. for yet another way to express this, as well as some other nil-safe variants)
(use '[clojure.contrib.core :only (.?.)])
(defn get-attribute5 [elt attr]
  (.?. elt getAttributes (getNamedItem attr) getValue))

My second example came about while trying to remove duplicates from a list while keeping the unique elements in the order they were originally given.


;; this is no good, it doesn't preserve order
(defn no-dupes0 [lst]
  (into #{} lst))

;; this only works for objects implementing comparable, and even then would
;; need to be done so in a clever way that incorporates the list which the elements came from
;; as it's not a natural ordering
(defn no-dupes1 [lst]
  (sorted-set lst))

;; at a loss for what to do, I banged out this loop/recur solution quickly
(defn no-dupes2 [lst]
  (loop [result (list)
	 seen #{}
	 xs lst]
    (if (empty? xs)
      (reverse result)
      (let [x (first xs)]
	(if (contains? seen x)
	  (recur result seen (rest xs))
	  (recur (conj result x) (conj seen x) (rest xs)))))))

;; this took me a long time to come up with as I'd never tried to write
;; a lazy sequence without higher level functions like map, iterate, et al
;; http://clojure.org/lazy was a great help figuring out how to use lazy-seq
(defn no-dupes3 [lst]
  ;; I have long wondered when to use letfn instead of let when defining functions
  ;; this is a good example, as a fn defined in a let isn't visible to itself
  (letfn [(internal [lst seen]
	    (lazy-seq
	     (when (seq lst) ;; learned: (seq lst) can be used as the opposite of (empty? lst)
	       (let [x (first lst)]
		 (if (contains? seen x)
		   (internal (rest lst) seen)
		   (cons x (internal (rest lst) (conj seen x))))))))]
    (internal lst #{})))

;; recommended (look at the source to see the pros at work.
;; it uses destructuring to avoid my (let [x (first lst)] clause
;; as well as recur, which probably saves the function from blowing the
;; stack on ridiculously long seqs)
(defn no-dupes4 [lst]
  (distinct lst))

Sum to a Million

Posted in clojure by youngnh on 08.09.10

I spent this past weekend trying to sum the numbers from 1 to a million as fast as possible. Here’s what I learned.

This function ran very fast:

(defn bigsum1 []
  (let [sum (atom 0)]
    (dotimes [i 1000000]
      (swap! sum + (inc 1)))
    @sum))

This one only barely slower:

(defn bigsum2 []
  (apply + (range 1 (inc 1000000))))

I ran each one of the above in a “test harness” like so:

(dotimes [i 100]
  (time (bigsum1)))

It’s not a perfect solution, but it gives you a pretty good idea of how long a typical run lasts.

Out of curiosity, I ran the following to get a sense of how long it takes to walk a list, since, if I understand what Clojure’s doing under the hood correctly, last and count both have to walk the given lazy list from head to tail without taking any shortcuts.

(last (range 1 (inc 1000000)))
(count (range 1 (inc 1000000)))

Interestingly enough, last and count both took longer than the dotimes and apply forms.

I thought perhaps spawning a bunch of threads, each to update a counter might perform better, which was the motivation for my next variation:

(defn bigsum3 []
  (let [sum (agent 0)]
    (dotimes [i 1000000]
      (send sum + (inc i)))
    (await sum)
    @sum))

I think agents nicely match the solution I was trying for. I created a thread-safe object to hold the sum and then dispatched a bunch of updates to it using the send function which would be executed in a Clojure-managed threadpool. + is associative and commutative so it doesn’t matter what order those updates were actually executed in as long as they all got executed. So finally, the code makes sure that all of those updates got executed by calling await before returning the current value of the sum.

This function took a looong time. Much longer than I expected it to, and much longer than any variant before it. Before actually using them, I had it in my mind that agents used threads and threads were another word for speedspeedspeed. I thought that, for sure, this code would run faster than bigsum1 and 2. Not out of any reasoned analysis, but simply based on the fact that they used those magical things called threads. I’m not saying that Clojure agents are slow, or that with some tweaking I couldn’t get an agent-based function to run very very fast, merely that my initial impressions of them were askew. I would love to hear from the Clojure-verse how I might tweak my agents example to make it fly.


Side-bar: What not to do:
Faced with the initial slowness of the agents solution, I was tempted to write this:

(defn bigsum4 []
  (let [sum (atom 0)]
    (dotimes [i 1000000]
      (.send (Thread. (fn []
        (swap! sum + (inc i))))))
    @sum))


I figured that maybe if I spawned the threads myself and used a thread-safe atom, my code would be faster than bigsum3. Not only was this code no faster than bigsum3, it’s incorrect. All of the threads it spawn may not have finished by the time it derefs sum. This code is very likely to return some intermediate result, atomically updated, but no less wrong. The lesson I learned here was not to confuse thread-safe with “don’t have to think about threads”.

7 Rules for Writing Clojure Programs

Posted in clojure by youngnh on 07.26.10

Over the past 5 months, I’ve had the incredible opportunity at Revelytix to write Clojure every day and get paid for it. 5 months is an incredibly short time to pretend to have learned anything, but I can feel the beginnings of a style emerge in my programming and while writing a small program some ideas congealed into actual words that I thought I’d capture here.

Update: Ugh. I really messed up. As it has been noted in the comments below, on Hacker News and even Twitter, my final solution is much (much) slower thanks to it’s not one, but two sorts. In the end, the whole thing is doubly redundant as clojure.contrib.seq-utils implemented a function ‘frequencies’ which will be in 1.2’s clojure.core. It uses ‘reduce’ and you should too. Don’t fall in love with your code kids, it can’t love you back.

#1 – Your brain will think in steps. It can’t help it
#2 – loop/recur is not a code smell, but be suspicious of it
#3 – Transformations of lists, not recursion is key
#4 – Don’t be lazy
#5 – One liners are hard to read
#6 – Use -> and ->> to make one liners easier to read
#7 – Don’t use -> or ->>

The program that brought these ideas to life was a small utility I needed to read a file and print out the set of characters contained within along with the number of occurrences of each character. Pretty simple.

Rule #1 – Your brain will think in steps.

I consider myself to be a fairly competent functional programmer. Of course, my first thoughts at solving this character counting program ran along these lines: Pull each character out of a string individually and increment a count for each one. I even put together an eight line (8!) function to do exactly that:

(defn char-counter [str]
  (loop [counts {}
         s str]
    (if-not (empty? s)
      (let [c (first s)]
        (recur (assoc counts c (inc (get counts c 0)))
               (rest s)))
      counts)))

Most programmers would stop here. I wish I could.

Rule #2 – Be suspicious of loop/recur

One-liners are the epitome of brevity, and brevity, the epitome of wit. Technically speaking, all Clojure s-expressions are one-liners since they are merely nested lists. But this is similar to saying that all C programs could be one-liners thanks to that handly ; character. Clojure has a natural kind of indentation and line-breaking. let and loop and things with binding forms all naturally form multi-line statements in Clojure. Not all programs are expressible as one-liners, but beyond a doubt, no one-liners use loop/recur.

Clojure, and more generally, functional programming, gain their power from their expressiveness. In drab, unimaginative Java, or even more awesome programming languages like Python and Ruby, the programmer has very little power over what concepts they can express directly in the language. At the very least, these concepts must be formed by some combination of syntax and language features. Clojure concepts, almost without fail, exist as individual symbols. When I decided that I wanted to make my eight-liner a one-liner, I was really deciding to express my program with a minimal set of complementary and orthogonal concepts.

Rule #3 – Transformation not recursion

Functional programming is about functions, right? So what could be more functional than a function calling itself? Just read the Wikipedia article on fixed-point combinators, the theoretical functions that allow functions to call themselves, it’s chock full of lambdas. Real functional programmers use recursion.

Recursion has its place in functional programming, but it’s not pervasive. Recusive-ness (recursivity?) is a property of the problem being solved. Repeating, nested computations on similar data structures are usually good candidates for recursion. Our problem is not. Instead, I reformulated how I might solve this problem. Here’s briefly the data structures that flashed through my mind in quick succession, forming a plan of possible action:

"abcdaabccc"

(a a a b b c c c c d)

((a a a) (b b) (c c c c) (d))

((a 3) (b 2) (c 4) (d 1))

((c 4) (a 3) (b 2) (d 1))

All lists. And each transformation vaguely indicates what Clojure function I should apply.

"abcdaabccc"
sort
(a a a b b c c c c d)
split it up
((a a a) (b b) (c c c c) (d))
count
((a 3) (b 2) (c 4) (d 1))
sort
((c 4) (a 3) (b 2) (d 1))

Rule #4 – Don’t be Lazy!

I couldn’t just write (sort (split-it-up (count (sort "abcdaabccc")))) not only because that would have evaluated my desired transformations in reverse order, but also because that’s a broad-strokes solution, the details would get in the way of that actually working. Here’s what I came up with instead:

(defn char-counter [str]
  (sort-by second >
           (map #(list (first %) (count %))
                (split-it-up #(not (= %1 %2))
                             (sort str)))))

I didn’t include my definition of split-it-up here, since the way I wrote it was a microcosm of bad style that suffers recursively from all the problems this blog post is trying to address. Consider the above code snippet what not to do.

My second attempt suffered from laziness. Not the kind that you can solve with a doall but the kind that encourages you to use a whole lot of helper defns and lambdas. Rule #4 is not about evaluation. It is recognizing that the distillation of a concept can be no more pure in Clojure than when it is expressed as a single symbol. Just because you can think of two or three intellectually easy ways to write something doesn’t mean you’ve done it justice.

Above, a helper function split-it-up that weighed in at as many lines of code as my original solution to this problem. Also, the (map #(list (first %) (count %) was an easy way to get the two values I needed from every sublist. It’s readable and it works, but it’s lazy.

Ask yourself why there isn’t a concept for what you’re trying to write already. Why isn’t it already a built-in function? Always push to use a single symbol, a function you didn’t write, where you might otherwise invoke one you did or use the ubiquitous map with an anonymous function.

Rule #5 – One liners are hard to read

The calls to sort were already well known concepts for me and work well here. The split-it-up and atrocious
(map #(list (first %) (count %)) calls needed to go.

There are some guidelines to finding functions that you didn’t know existed. Brute-force search is a pretty good start. clojure.core isn’t all that large. Starting from a similar function is a good strategy for breaking up your search-space. In my case, I had heard of partition and knew it’s operation to be very similar to what I wrote split-it-up to do. Clojure 1.1 doesn’t have it, but the to-be-released-any-day-now Clojure 1.2 has partition-by which turned out to be exactly what I needed.

I was totally lost as to how to further distill the second expression. I knew I wanted something that used first and count and little else. The call to list and placing the arguments for evaluation were plumbing, something that wasn’t directly related to my problem. This seemed to me to be a new concept I had no tool for. Provided a list of functions, I wanted to get a new function that returned the result of collecting their application.

I considered a couple of approaches. do is a form similar in spirit. Besides the fact that it’s not a pure function, it takes some computations and performs each one. I could have written my own combinator and solved the plumbing case in general with something like:

(defn frobulate [fs]
  (fn [& vals]
    (map #(apply % vals) fs))

But for some reason it got wedged in my head that perhaps iterate might hold the key to a terse expression of this idea. It didn’t. Two functions down from that in the docs however, I found juxt. Exactly what I needed.

A one-liner emerged:

(sort-by second > (map (juxt first count) (partition-by identity (sort "abcdaabccc"))))

Rule #6 – -> and ->> are your friend

It still reads backward. The revelation that I should be using ->> came to me way back at Rule #3. I was applying multiple transformations to a single sequence, which the ->> macro exists to make more readable.

(->> str
     sort
     (partition-by identity)
     (map (juxt first count))
     (sort-by second >))

Rule #7 – Don’t use -> and ->>

If you’ll notice, we’ve now transformed our crappy, procedural series of steps into a beautifully constructed and elegantly distilled…series of steps. Don’t worry so much about it. This is, to my eye, another one of the great circle-of-life symmetries (ourobous?) of computation. Code is data, data is code, procedural steps beget functional evaluation beget a series of procedural steps. Just to be clear though, I still think you should use ->>. In our case, I like it’s readability better than our one-liner. But if you sat down next to me, explained your fantastic transformative functional ideas and then the next 4 characters you typed were the threading macro, I’d take the keyboard away from you and bludgeon you to death with it. Because of Rule #1, Rule #5 becomes a dangerous weapon. Most of the time, a let would suffice to make our first, wild attempts readable and flexible. In our case, we could:

(let [a (sort str)
      b (partition-by identity a)
      c (map (juxt first count) a)]
  (sort-by second > c))

but it’s pathological.

Parting thought. juxt was a bit of a revelation for me. 5 months ago, I never would have even looked for an operator to solve what I could write myself to suit my needs just as well. The cold, stark expressiveness of
(juxt first count) makes me wonder if now that you hold in your hands the 4 steps of character counting: sort (partition-by identity) (map (juxt first count)) and (sort-by second >) you could combine them with one other operator, (instinctively I want to say reduce?) to write a one liner that doesn’t do so much as be.

fixies

Posted in clojure by youngnh on 03.24.10

write tests independent of their environment

As part of a larger effort at Revelytix, we’re building a Clojure lib to work with Semantic Triplestores. This post sums up my experiences testing our code with clojure.test (formerly known as clojure.test-is).

This is long. If you’re only mildly interested, skip to the good parts.

There are 5 methods I want to test: create-graph, drop-graph, insert-triple, load-file, and query.

In this post, I use two specific Semantic Web concepts, the graph and the triple. The analogy is poor, but you can think of a graph as database table, and a triple as row in a table.

My overarching concern here is that I’d like my tests to be as idempotent as possible. Once run, they should restore the state of the external triple store so that I can run any test individually or run the entire suite in any order without having to check if errors are due to incorrect preconditions. In fact, this is a requirement as clojure.test‘s run-tests function states in it’s documentation that it will run the defined tests in whatever flipping order it feels like (paraphrasing mine).

The first test I wrote was for create-graph. However, it right away violated my principle of restoring the state of the external world because it leaves a graph laying around.

(deftest test-create-graph
  (is (resultMessage= "Successfully created graph rmi://australia/server1#family"
                      (execute-command (create-graph *FAMILY*)))))

Maybe create-graph was the wrong choice for a first test. The next likely candidate is drop-graph, however, it will throw an exception if you ask it to drop a graph that doesn’t exist yet, so it’s not a perfect choice either. Really, if I’m to get away with this, I need to test both functions execution in a predefined order; my test won’t run and clean up after itself until both functions are implemented correctly.

Now, there’s 2 ways I could write a test that executes create-graph and drop-graph in a specified order and cleans up after itself:

I could write a single deftest form for each, and even though I said that run-tests does not guarantee that the tests will be run in the order they appear in the source file, the deftest form turns your test into a regular fn, callable just like any other you might create via the defn form. This allows us to then deftest a combination of both that will guarantee that they run in the right order.

(deftest test-create-graph
  ...)

(deftest test-drop-graph
  ...)

(deftest test-create-and-drop-graph
  (test-create-graph)
  (test-drop-graph))

This introduces a new problem, though. clojure.test now thinks you have 3 tests. deftest- will make the create and drop tests private, and only the composed test is now seen by clojure.test.

(deftest- test-create-graph
  ...)

(deftest- test-drop-graph
  ...)

(deftest test-create-and-drop-graph
  (test-create-graph)
  (test-drop-graph))

The second possiblity is to use the testing macro to label each part. I like this solution less since it doesn’t produce two individually runnable tests, but functionally it ensures that the tests run in order and aesthetically it describes itself quite nicely.

(deftest test-create-and-drop-graph
  (testing "Create Graph"
    (is (resultMessage= "Successfully created graph rmi://australia/server1#family"
			(execute-command (create-graph *FAMILY*)))))
  (testing "Drop Graph"
    (is (resultMessage= "Successfully dropped graph rmi://australia/server1#family"
			(execute-command (drop-graph *FAMILY*))))))

Next, I want to test the insert-triple and load-file functions. Neither make much sense to call without an existing graph to work with. I could, in each test, create a graph and then tear it down:

(deftest test-insert-triple
  (try
   (execute-command (create-graph *FAMILY*))
   (is (resultMessage= "Successfully inserted statements into rmi://australia/server1#family"
                       (execute-command (insert-triple *FAMILY* "Joe" "father_of" "Billy"))))
   (finally
    (execute-command (drop-graph *FAMILY*)))))

(deftest test-load-file
  (try
   (execute-command (create-graph *FAMILY*))
   (is (resultMessage= "Successfully inserted statements into rmi://australia/server1#family"
                       (execute-command (load-file *FAMILY* *N3_FAMILY_FILE*))))
   (finally
    (execute-command (drop-graph *FAMILY*)))))

The try/finally form is especially well-suited for testing since by contract it will ensure that despite any exceptions the code under test may throw, the graph will get dropped. It also has the benefit of returning the value of the last assertion and not the value of dropping the graph. By wrapping your assertions in a try/finally you’ve preserved the external “value” of your test while still dramatically altering what goes on when it’s called. That’s a pretty powerful encapsulation technique.

However, we’ve duplicated code setting up and tearing down the graph, so this would be a great place for a fixture. Fixtures in clojure.test are just regular functions that take a function to execute before or after the fixture’s done it’s thing:

(defn family-graph-fixture [f]
  (try
   (execute-command (create-graph *FAMILY*))
   (f)
   (finally
    (execute-command (drop-graph *FAMILY*)))))

The tests for insert-triple and load-file can now assume that their environment has been setup for them and can focus solely on the meat of their test:

(deftest test-insert-triple
   (is (resultMessage= "Successfully inserted statements into rmi://australia/server1#family"
                       (execute-command (insert-triple *FAMILY* "Joe" "father_of" "Billy")))))

(deftest test-load-file
  (is (resultMessage= "Successfully inserted statements into rmi://australia/server1#family"
                      (execute-command (load-file *FAMILY* *N3_FAMILY_FILE*)))))

Before we actually hook the fixture up to our tests, let’s take a look at the test for query. This function will blow up if it tries to select triples from a non-existent graph and we’ll want that graph to be populated with sample data. We already have a fixture for creating and dropping a graph, we can reuse that and add another for populating the graph with data.

(defn family-data-fixture [f]
  (execute-command (load-file *FAMILY* *N3_FAMILY_FILE*))
  (f))

We’re almost there. The execute-command function runs the given triplestore command on a dynamic var named *connection*. We’ll need a fixture to bind that var to a real connection.

(defn connection-fixture [f]
  (binding [*connection* (connect-to *HOST*)]
    (try
     (f)
     (finally
      (dispose *connection*)))))

We could then attach that fixture to every test we execute, but that would establish and dispose the connection on every test execution. We can save a lot of bandwidth and execution time by having every test use the same connection. Telling clojure.test to execute all tests in this fixture is simple, just put this form in your test file:

(use-fixtures :once connection-fixture)

Now we need to hook everything else up. clojure.test would suggest using (use-fixtures :each ... ) but that would establish all of the fixtures for every test. As far as I can tell, clojure.test doesn’t have a proscribed way to map individual fixtures to individual tests, so we have to apply our fixtures and compose the individual tests by hand. You override run-tests default execution by defining a test-ns-hook function with your setup-by-hand tests.

(defn test-ns-hook []
  (test-create-and-drop-graph)
  (family-graph-fixture test-insert-triple)
  (family-graph-fixture test-load-file)
  (family-graph-fixture #(family-data-fixture test-query)))

Being able to pass around fns makes the whole process quite easy. Note that I have to pass an anonymous function when nesting fixtures, but it doesn’t add too much line-noise and is written in execution-order, which increases readability. Also, even though I haven’t shown it in my above snippet, since we are defining how to run the tests with test-ns-hook we have taken responsibility for which tests get run, and no longer have to define any tests using the deftest- form.

However, now that we’ve done that, we’ve broken our (use-fixtures :once ...) statement. Calling run-tests will no longer establish the connection, and so need to add a final layer of indirection.

(deftest test-suite
  (test-graph-commands)
  (family-graph-fixture test-insert-triples)
  (family-graph-fixture test-load-file)
  (family-graph-fixture #(data-fixture test-query)))

(defn test-ns-hook []
  (connection-fixture test-suite))

Here’s the final result.


My co-worker (thats-right-I-am-dangerous-Iceman) Alex Miller points out that though it is prominent and publicly available, the front page of the clojure.test API is sometimes the last place you look to get help with this stuff, but the Overview documentation for the lib is excellent. Check it out.

tl;dr

  • Fixtures as they exist today in clojure.test aren’t quite flexible enough to cover all of my testing needs, but there are not-overly-complicated ways to get around that.
  • Keep tests short, testing only what concerns them and leave out all environment setup code.
  • Specify setup and teardown code in their own fixture functions for maximum reuse and composability.
  • Use a suite test to hook up fixtures to individual tests and test-ns-hook to run the suite
Tagged with: ,