Mocking frameworks are like cologne
Mocking frameworks are like cologne …
they help make your tests smell better, but they can cover up some really stinky code.
I was working my way through some old bookmarked blog posts when I ran across Decoupling tests with .NET 4 by Mr. Berridge and it reminded of a discussion I had a few weeks back while teaching a refactoring workshop. The discussion centered around the question: When should I use a mocking framework and when should I use real objects or hand-coded stubs?
*** I will leave the questions of the differences between stubs & mocks and the merits of testing in true isolations versus using real objects for another day ***
Mocking frameworks like Mockito or Moq are great tools, but lately I have been finding myself using them less and less. I have found that the more I use mocking frameworks the more easily I break the Single Responsibility Principle. Now, this obviously isn’t the fault of the tool. It is my fault. And my pair’s. The real problem is that we grew accustomed to using the tool and began to lean on it as a crutch. As we added more features our code took on more and more responsibilities; but we didn’t feel the pain directly because Mockito was doing more and more of the heavy lifting.
In Kevin’s post, he specifically talks about TDDing an object with many dependencies and the pain of updating each test as new dependencies were added. Now, I have a ton of respect for Kevin and his code … but I wonder if the another solution to his problem would be to re-examine why the object needed so many dependencies. Is it doing too much? Could these dependencies be re-described and possibly combined into the overall roles they fulfill?
Kevin’s solution was clever and a nice tip to keep in my pocket. The devil is in the details. As I try to “follow the pain” I am trying to write more hand-coded stubs and evaluate any time I find myself relying on side-effects to verify how my code works. Wish me luck.
Copyright 1988
I found this in the book I’m working my way through. It’s a bit obvious, but see if you can’t tell what it does:
10 REM unknown 20 INPUT L$(1), L$(2), L$(3), L$(4) 30 PRINT 40 FOR I1=1 TO 4 50 FOR I2=1 TO 4 60 IF I2=I1 THEN 130 70 FOR I3=1 TO 4 80 IF I3=I1 THEN 120 90 IF I3=I2 THEN 120 100 LET I4=10 - (I1 + I2 + I3) 110 LPRINT L$(I1); L$(I2); L$(I3); L$(I4); 120 NEXT I3 130 NEXT I2 140 NEXT I1 150 END
The book is Umberto Eco’s Foucault’s Pendulum.
When IS the last responsible moment?
As agile software developers we work hard to make the development process as flexible and lean as possible for our customers. We work hard to estimate accurately, deliver working code quickly, and only do the work needed to finish the story right in front of us. This discipline is hard but keeps our customers in the loop and getting the most for their money/time. In order to keep up with the customer’s every changing requirements we often push-off decisions to the last responsible moment. By delaying making expensive decisions until our hand is forced, we reduce waste and can stay focused on delivering value with each finished story.
But, when is the last responsible moment? Who decides? Has it changed?
With our heads down in the trenches, knocking out stories left and right, and moving fast toward our client’s vision … we have to keep these questions in mind. A recent project reminded me of this. Our team did a great job at empowering our client to shift our priorities as their business understanding changed. We offered options, gave accurate estimates for stories, and allowed them to shift the work at hand each iteration. We did exactly what we were setting out to do.
Unfortunately, we were so focused on short-term flexibility and options that the higher level vision began to slip, and some decisions were made AFTER the last responsible moment.
The key here is the responsible. While we work to increase our productivity, reduce waste, and keep our customer’s immediate desires met … we need to keep in mind that the short-term decisions add up. Long off decisions and high level visions don’t need to be fully spec’ed and decided day 1, but the same way we maintain our suite of tests as requirements change, short-term decisions needed to be reconciled with the high level vision. A great team keeps in mind when this last responsible moment is for different kinds of decisions and revisits these frequently.
</rant>
Chasing the Sun
As an incomplete response to Ben’s long-ago challenge, “Long Walk After the Sun”, I found 2 websites that gave me enough information to start calculating the answer.
The first is the Naval Oceanography Portal, a veritable plethora of astronomical almanacs, calculators and errata. In particular, their Sun Altitude/Azimuth Table Calculator is particularly useful. Plug in a date and town in the United States and it will calculate for you at down to minute resolution, exactly where in the sky you will find the sun from sunup to sundown.
Using their software to generate a table of headings, you would then know exactly which direction your hypothetical man would be traveling in at any given time of day.
The next useful online calculator I found was on the Federal Communications Commission’s website, of all places. It has a calculator which will determine “Terminal Coordinates Given a Bearing and a Distance”. You plug in a starting latitude and longitude, as well as a distance to travel and your bearing, and it will spit out the latitude and longitude you’ll end up at.
From Wikipedia, humans walk at a pace of about 5km/h, or about 83 meters per minute. Given a set of starting coordinates, you could look up a heading from the Azimuth/Altitude chart, and set the distance traveled to (83 meters x the number of minutes walking) and get a new location. If you had the patience to calculate a very large number of items, you could trace the path from sunrise on January 1st, 2009 to sundown on December 31st, 2009.
Or someone could write a scraper. Or better yet, look up the algorithms these sites use to calculate these sort of things and not hammer their servers for curiosity’s sake.
I did a few calculations by hand (and by “by hand” I mean I copied and pasted numbers into web forms) and here’s the track of a person following the sun today, November 17th, 2010 from sunrise at 5:50 AM until he gets tired and passes out at 8:00 AM
Time Latitude Longitude Bearing (E of N) 05:50 38 38 0 N 90 15 0 W 105.8 06:00 38 37 53 N 90 14 27 W 107.3 06:10 38 37 45 N 90 13 54 W 108.8 06:20 38 37 36 N 90 13 21 W 110.2 06:30 38 37 27 N 90 12 49 W 111.8 06:40 38 37 17 N 90 12 17 W 113.3 06:50 38 37 06 N 90 11 45 W 114.8 07:00 38 36 55 N 90 11 14 W 116.4 07:10 38 36 43 N 90 10 43 W 118 07:20 38 36 30 N 90 10 13 W 119.6 07:30 38 36 17 N 90 9 43 W 121.3 07:40 38 36 03 N 90 9 14 W 123 07:50 38 35 48 N 90 8 45 W 124.7 08:00 38 35 33 N 90 8 17 W 126.5
Something that stands out is that in the northern hemisphere, the sun sweeps an arc from something like 60 to 300 degrees E of N in a single day. That’s during the summer. Winter months the sun is lower in the sky (spends less time above the horizon, so less time walking), and sweeps a smaller arc, but one still mostly centered south. Also, the bearing of the sun changes slower around noon. I’d venture to guess that the time you’d spend walking east and west wouldn’t be as long as the time you spent walking mostly south. In a single day, a person wouldn’t cover very much ground over the surface of the earth. There are over 100 kilometers between lines of latitude on the earth, so even traveling in straight lines, it’d take close to 2 ten-hour days to drop a line of latitude. Over the course of an entire year, though, that might add up to some serious displacement.
I’m now very interested in this problem, and as a shot-in-the-blue guess, I’m going to say that our wandering man would end up somewhere on the line where the sun traces a perfect East to West path overhead, as north of it he’d be drawn south and south of it he’d be drawn north. I’m not sure that’s the equator, though since the earth is tilted on it’s axis. Ah, more problems already brewing and I haven’t even properly finished this one yet. Time to stop typing.
A few other websites I stumbled across researching this post that provide some illuminating links:
The Griffith Observatory in Los Angeles, in particular this page which turned me on to the possibility of not having to derive all the math myself because humans have been looking at the sky since the dawn of their existence and who would’ve thunk that at least one of those dirty apes thought to write down what they saw.
This website’s FAQ has a tip for a dead-tree book with Astronomical Algorithms in it.
HTML 5 Game of Life – Canvas Tag
In a previous post I described how I created a simple version of John Conway’s Game of Life using HTML 5 Web Workers for multi-threaded matrix calculations: HTML5-GoL. Once each new matrix is calculated, I needed to display it somewhere. In order to keep rendering (and therefore UI thread overhead) at a minimum I decided to only pass back the delta from the previous matrix to the current one (those squares with new life and recently deceased). The end result was a pretty fast implementation that could render fast enough for smooth transitions.
Canvas Tag – Simple 2D bitmap graphics
The Canvas tag was first introduced with Apple within Webkit and allows for a simple block area to render simple 2D bitmap graphics. This differs from SVG implementations which render individual graphic primitive objects which can be re-rendered and modified as DOM elements. Canvas tag graphics are simple drawing, composition, translation, and image manipulation.
In order to draw each Game of Life matrix, we first need the canvas area
<!DOCTYPE html> <canvas id="gol_canvas" width="401" height="401"></canvas>
Next we need to get the 2D context from the canvas DOM element.
var context = document.getElementById('gol_canvas').getContext('2d')
Once we have the 2D context we can start drawing simple shapes (lines, blocks, circles, etc) or do any number of other 2D actions. For this simple Game of Life example I created a new GoLCanvas object which would hold all of the GoL specific actions (e.g. create life at x,y, clear the board, etc). Overall the drawing API is simple if not a bit archaic (reminds me of my CS course in computer graphics where we did very basic OpenGL commands in C++).
Taken from gol-canvas.js
var draw_background = function() { context.clearRect(0, 0, size, size) context.fillStyle = COLORS.BACKGROUND context.fillRect(0, 0, size, size); }
The canvas work was simple and effective; only touching the surface of what can be done. There are a ton of great tutorials on the canvas tag and what can be done with it.
symbol-macrolet and inside-out test fixtures
Rolling right along with more ways to write fixtures, let’s say we start with a test that looks like this:
(deftest test-convoluted
(with-connection hdb
(try
(do-commands "create table person (name varchar(255))")
(insert-values :person [:name] ["bill"] ["joey"])
(with-query-results results ["select name from person"]
(is (= "bill" (:name (first results))))
(is (= "joey" (:name (second results)))))
(finally
(do-commands "drop schema public cascade")))))
It does it’s setup and teardown inside of the deftest itself. The actual “testing” parts are the two (is)
forms deeply nested inside. I’ve already written a post on how the clojure.test
lib that comes with Clojure addresses this kind of complexity by providing fixtures. However, they’re a bit clumsy and aren’t particularly fine-grained. You can either run them once before (and after) all tests run, or once before and after each test runs.
The Common Lisp testing framework FiveAM has a different approach to defining and using fixtures. The FiveAM library defines two macros, def-fixture
and with-fixture
which define a fixture form and execute forms inside of a named fixture, respectively.
To define a fixture that takes care of the setup and teardown in the test above, we would write something like this:
(def-fixture person-schema []
(with-connection hdb
(try
(do-commands "create table person (name varchar(255), age integer)")
(insert-values :person [:name] ["bill"] ["joey"])
(with-query-results results [query]
(test-body))
(finally
(do-commands "drop schema public cascade")))))
In the above, test-body
is a special captured variable that will be replaced with whatever forms you later specify. It’s where the meat of your test will go. You specify what to run there as the body of the with-fixture
macro, thusly:
(deftest test-ideal
(with-fixture person-schema []
(is (= "bill" (:name (first results))))
(is (= "joey" (:name (second results)))))))
The with-fixture
form names the fixture we want to use, and takes care of expanding it in such a way that the variables are still checked by the compiler. If our fixture didn’t declare results in the scope that our assertions were expanded to, the compiler would complain just as if we had written the whole thing out by hand.
We can make the fixture in our example even more flexible. def-fixture
can declare arguments and with-fixture
can provide them. Altering our setup slightly to add a second column, we can then pass in a specific query to be run per fixture:
(def-fixture person-schema [query]
(with-connection hdb
(try
(do-commands "create table person (name varchar(255), age integer)")
(insert-values :person [:name :age] ["bill" 25] ["joey" 35])
(with-query-results results [query]
(test-body))
(finally
(do-commands "drop schema public cascade")))))
And then we can get a lot more mileage out of our solitary fixture:
(deftest test-ideal
(testing "name column"
(with-fixture person-schema ["select name from person"]
(is (= "bill" (:name (first results))))
(is (= "joey" (:name (second results))))))
(testing "age column"
(with-fixture person-schema ["select age from person"]
(is (= 25 (:age (first results))))
(is (= 35 (:age (second results)))))))
This is a trick that defmacro
can’t easily perform. If we wanted to define person-schema
as a macro, in order to capture results
, we’d have to write put ~'results
somewhere in a backtick form. It’ll work, but for any significant number of capturing symbols, there’s tildes and ticks everywhere. For a feature that would ostensibly have users writing lots and lots of their own expansions, that’s a major drawback, in my opinion. Early versions of the newly re-written ClojureQL had users do this, I believe, with little snippets of macros sprinkled throughout their code. It turns out that we can have our cake and eat it too, and you’ve probably guessed how from the title of this post.
I copped the implementation straight from FiveAM. Common Lisp has an advantage here, as the language has built-in local macros (Clojure’s are global to a namespace) and symbol macros, which FiveAM uses to great effect. However, Konrad Hinsen (of clojure.contrib.monad
fame) has implemented local and symbol macros in the clojure.contrib.macro-utils
lib. I used that.
My Clojure implementation consists of the two macros and a global dynamic variable that holds an atom mapping fixture names to their arguments and bodies. The def-fixture
macro takes care of assoc
-ing them as they are defined. The with-fixture
macro pulls that fixture definition and constructs an anonymous function from it, as well as making test-body
a symbol macro that expands to the given body of with-fixture
.
(def *fixtures* (atom {}))
(defmacro def-fixture [name args & body]
`(swap! *fixtures* assoc '~name (cons '~args '~body)))
(defmacro with-fixture [name args & body]
(let [[largs lbody] (get @*fixtures* name)]
`(symbol-macrolet [~'test-body (fn [] ~@body)]
((fn ~largs ~lbody)
~@args))))
I could have used a local macro, by swapping the symbol-macrolet
form for a macrolet
form, but then I would have had to quote the body
parameter passed to with-fixture
. By using a symbol macro and asking that users treat it like a fn, I can avoid that. It’s a small thing and either way works. For 8 lines of code overall, these 2 macros add a lot of flexibility to how you can define test fixtures.
Tie Git Submodules to a Particular Commit or Branch
While working on getting QUnit-CLI cleaned up and refactored a bit, I realized I needed to tie the example code in the Git repository to a particular version of QUnit.js (those guys are making changes too fast for me to keep up). I have used SVN:externals prevsiously so Git submodules seemed like an obvious solution. A single submodule should allow me to keep QUnit-CLI inherently pointing to a particular revision of QUnit.js without requiring me to seperately document which version I was testing against.
The man page for git-submodule as well as the Git Book chapter on Submodules do a good job of documenting the command with some simple examples, but none that were 100% clear for my needs. In addition, I need my Git submodule to point to a specific commit (or branch) so that everyone cloning my code consistently can run my examples w/o fear that a new commit on HEAD will break something.
Step 1 : Add the submodule
Once the module is checked out, I need to add the QUnit submodule. First grab the GitHub url for my QUnit fork (eventually this will be replaced with the main QUnit repo) and execute the “add” command from within your local repository root.
git submodule add git://github.com/asynchrony/qunit.git qunit
Afterward there will be two modified and staged objects in your repo: .gitmodules will contain the submodule’s local path and source URL and a new folder named qunit which contains a full clone of your source repository.
** Fraser Speirs has a good writeup on what is going on behind the scenes with the Git internals and how the key to all of this is in the index files of each repo and the modes the changes are committed with. **
Step 2 : Fix the submodule to a particular commit
By default the new submodule will be tracking HEAD of the master branch but will NOT be updated as you update your primary repo. In order to change the submodule to track a particular commit or different branch change directory to the submodule folder and switch branches just like you would in a normal repo.
git checkout -b dev_branch origin/dev_branch
Now the submodule is fixed on the development branch instead of HEAD of master. Just easily I could set it to specific commit or tag.
Step 3 : Commit everything
Now from your primary repository you still have two modified objects: .gitmodules file and qunit folder. Commiting these changes will persist the new submodule tracking your desired branch.
Step 4 : Clone Recursive
The next time you (or someone else) clones this repo, they will need to do one of two things.
A) Add the –recursive flag to their git clone command
git clone REPO_URL --recursive
B) manually initialize and the submodules after the clone
git clone REPO_URL git submodule update --init --recursive
Fixture Macros
A few months ago, at Revelytix, we put together a pretty large test-harness at work for putting Semantic Triplestores through their paces. The effort was highly dependant on setup and teardown of the external environment.
For instance, in order to run a single test that involves making a http request to a triplestore-backed web application, I had to write code to start the virtual machine my web application and triplestore images were installed on, restore that vm to a known-state snapshot, wait for the guest operating system to come online, login to the web application and establish a session, and oh yeah, actually make the http request to see the results I was really interested in.
Ultimately, we went with a design that passed a lot of maps around, but tonight I found myself reviewing some of the other possibilities we could have pursued. Clojure maintains great separation of concerns by allowing you to pass functions around. For example, here’s the function I originally wrote to setup and teardown a virtual machine, running some arbitrary function f
in-between:
(defn vm-setup [vm snaphsot f]
(restore-snapshot vm snapshot)
(start-vm vm)
(try
(f)
(catch Exception e
(.printStackTrace e))
(finally
(save-vm-state vm))))
The function knows nothing about what f
does and so f
can be anything.
After the call to start-vm
, VirtualBox launches immediately, but it often takes 10-15 seconds for the guest OS to start running and communicating. Executing f
immediately after starting the vm will often fail if you don’t first wait for the guest OS to come online. I could write that sort of functionality into the vm-setup
function, but that’s mixing concerns, and I could just as easily write another setup function that tries to ping the guest OS, executing it’s payload once the machine has started communicating:
(defn host-reachable? [host timeout f]
(if (ping host timeout)
(f)
(throw (Exception. (str "Could not reach host: " host " within " timeout " ms")))))
This pattern goes on. Conceptually, I ended up with a single test being expressed like this:
(vm-setup "WebAppVM1" "CleanSnapshot"
(host-reachable? "hostname" (* 30 1000)
(webapp-login "hostname" "username" "password"
(times 30
(timethis
(http-request "/some/webapp/path"))))))
Which is pretty readable and it’s structure matches it’s meaning. The test itself is the call to
(http-request "/some/webapp/path")
and it’s nested pretty deeply in an execution context.
If we needed to resuse all of those steps, we could make it into a composite setup function:
(defn start-vm-and-login [f]
(vm-setup "WebAppVM1" "CleanSnapshot"
(host-reachable? "hostname" (* 30 1000)
(webapp-login "hostname" "username" "password"
(times 30
(timethis
(f))))))
The code above, however, won’t run. The forms aren’t function objects, they’re function invocations that return values. We’d have to wrap each of them in (fn [] )
in order to lambda-ize them:
(vm-setup "WebAppVM1" "CleanSnapshot"
(fn []
(host-reachable? "hostname" (* 30 1000)
(fn []
(webapp-login "hostname" "username" "password"
(fn []
(times 30
(fn []
(timethis
(fn [] (http-request "/some/webapp/path")))))))))))
We can write our own macro to take the first version and expand into the second, and it helps if we notice how similar it is to the ->
macro. In fact, we could write:
(->> (http-request "/some/webapp/path")
(fn [])
(timethis)
(fn [])
(times 30)
(fn [])
(webapp-login "hostname" "username" "password")
(fn [])
(host-reachable? "hostname" (* 30 1000))
(fn [])
(vm-setup "WebAppVM1" "CleanSnapshot"))
Which reads a little bit backwards, but provides a great target for our macro to generate:
(defmacro fixture [& forms]
`(->> ~@(interpose '(fn []) (reverse forms))))
Allowing us to write:
(fixture (vm-setup "WebAppVM1" "CleanSnapshot")
(host-reachable? "hostname" (* 30 1000))
(webapp-login "hostname" "username" "password")
(times 30)
(timethis)
(http-request "/some/webapp/path"))
Not bad
Notes to self: Keep SVN in sync w/ Git The Poor Man’s Way
There is probably a much better/faster way to do this leveraging git-svn, but for the poor fool who has to keep an SVN repo in sync w/ the development Git repo for client consumption … here are a couple helpful one liners…
Copy from the Git repo directory to the SVN repo directory
rsync -r $GIT_DIR $SVN_DIR
Detect if there are any non-.svn folder differences between the directories after (good sanity check)
diff -r $GIT_DIR $SVN_DIR | grep -v '\.svn' | wc
Porting Haskell’s Parsec
Parsec is a parser-combinator library. Parser combinators are built around the idea of making a bunch of very small and focused parsers and combining them using operators that one would more usually see in regular expressions. Ultimately leading to parsers that feel more like function calls in a program that a stiff declaration of a grammar. Parsec is king in Haskell-land. In Clojure, however, there are a number of libraries fully- and not-so-fully written that can be used to write parsing programs. fnparse, Amotoen, clarsec, parser, clj-peg are just a few, feel free to mention your favorites in the comments. I don’t mean to leave any out, but rather point to out that what I’m doing here is not new. I do hope it’s illuminating for some.
Parsec, as I see it, boils down to 2 ideas.
A parser either consumes input or doesn’t. Consumed
or Empty
.
A parser either succeeds in parsing or it fails. Ok
or Err
.
These outcomes can be combined into 4 continuation functions that are passed to every parser:
cok
– Consumed & Okcerr
– Consumed & Erreok
– Empty & Okeerr
– Empty & Err
As for errors, Parsec defines two types of them. Those that we can say something about, and those that we can say nothing about. These are errors with messages and unknown errors, respectively. Of the errors that we can say something about, some are the result of not finding input that the parser was expecting, which lead to messages like “expected ‘a’ and found ‘b'”, and some are the result of not finding input where we expected to, which lead to messages like “unexpected end of input”.
Finally, Parsec keeps tabs on the thing it’s parsing, it maintains state. The state is made up of 2 elements, the input stream itself, of which a Clojure seq models nicely and the current source position, itself made up of the name of the input, and one’s current line and column location in it.
The Most Basic Parsers
The simplest parser is the one that no matter what, returns a constant value. This is called parserReturn
in Haskell, but in Clojure, it’s more akin to the constantly
function, so I’ve named it always
, and here’s it’s simplified implementation:
(defn always [x]
(fn [state cok cerr eok eerr]
(eok x state)))
This implementation makes sense. No matter what, it returns a new parser. A parser is merely a fn that takes a state and 4 continuations. The always
parser always calls the Empty & Ok continuation. Nothing was removed from the stream (hence the Empty part), and everything should continue on as normal (the Ok part).
Equally simple is the parser that always fails. This is called parserZero
in Haskell, since it represents a “nothing” parser.
(defn parser-zero []
(fn [state cok cerr eok eerr]
(eerr (unknown-error state))))
(defn unknown-error [{:keys [pos] :as state}]
(ParseError. pos []))
More Interesting Parsers
One of the more basic parsers in Parsec is tokenPrim
, which processes a single element from the underlying stream. It unconses the first element from the head of the input, tests if it is supposed to be consumed and then updates the state’s current position in the input. To do this, it takes 3 functions.
nextpos
calculates a new source position based on the item consumed and the old position.
test
takes a single element from the underlying stream and returns whether or not to consume it
showToken
is used to create readable error messages by returning a string representation of stream elements
(defn token-prim [show-f nextpos-f consume?]
(fn [{:keys [input pos] :as state} cok cerr eok eerr]
(if-let [s (seq input)]
(let [item (first s)
rest-of-input (next s)]
(if (consume? item)
(let [newpos (nextpos-f pos item rest-of-input)
newstate (InputState. rest-of-input newpos)]
(cok item newstate))
(eerr (unexpect-error (show-f item) pos))))
(eerr (unexpect-error "" pos)))))
There are three ways the above function continues. Two are through eerr
, one when there is nothing left in the seq when we were expecting to parse something, and one when we did parse something, but our test told us not to consume it. In the second case we can produce a decently readable description of the item so that we can later present it to the user. Finally, if our test tells us to go ahead and consume the item, we call cok
passing it the item and a newly calculated state with a new position and the input without our consumed item on the front.
There’s a lot of parsers we can implement on top of token-prim
, however, it’s got no brain. You can only line up a number of token parsers one after another and let them tell you if the input matched in the order you thought it would. We can’t express the idea of “or” with it. For that, Parsec relies on the parserPlus
parser. It’s called “plus” because it’s used to glue multiple parsers into a single one, analagous to how addition of numbers glues them all together into a new, single number (I never used to think about things like this. Haskell has made me re-understand everything I already knew).
The strategy for implementing parserPlus
is that it will take 2 parsers and try the first one. If that succeeds, we’ll go with that. If it doesn’t, we try the second one, and if it succeeds we want our combined parser to be indistinguishable from that second parser. If neither work, then our parser didn’t work and we want to escape like any other parser would if it failed. Calling the first parser is easy. For the sake of staying close to the original Haskell, we’ll call this parser m
. Parsers in Haskell and Clojure are simply functions, so in order to try it, we can invoke it and pass the current state and the 4 continuations it expects.
The continuations are our hook to intercept failures. We know that if m
fails, it will call the fourth continuation we pass it. So to try the second parser, n
second we’re going to wrap the eerr
function (the 4th continuation) by trying that second parser before giving up and calling eerr
. Here’s how it looks in Clojure:
(defn parser-plus [m n]
(fn [state cok cerr eok eerr]
(letfn [(meerr [err]
(letfn [(neok [item state-prime]
(eok item state-prime))
(neerr [err-prime]
(eerr (merge-error err err-prime)))]
(n state cok cerr neok neerr)))]
(m state cok cerr eok meerr))))
The loacally nested functions aren’t exactly readable at a glance, but combined with the knowledge of what’s happening it’s a really elegant way to express the idea. Also, as a small note, there aren’t great names for some of the nested function parameters. state-prime
and err-prime
? Well, that’s a holdover from Haskell to express that the thing is an altered version of the thing it came from. In mathematics, this is expressed as a tick, state'
and err'
. Those aren’t legal Clojure 1.2 identifiers, so I opted to be verbose. Starting with the Clojure 1.3 alphas available now, tick is a legal constituent character, which means you can use it anywhere in an identifier except as the first character.
The last parser I’d like to tackle in this blog post is manyAccum
. This parser wraps behavior around an existing parser and so becomes a tangle of continuation functions just like parser-plus
was, but unlike parser-plus
, manyAccum
only accepts one parser and attempts to apply it 0 or more times. This is the Parser equivalent of the Kleene operator.
Just like parser-plus
, we’re going to invoke the parser manyAccum
is given and create a new parser by manipulating the continuations we pass to it. Specifically, if the parser we’re given fails to consume any input (calls eerr
), we’re going to hijack that and report that it was instead an eok
with an empty list. If the parser succeeds in consuming input, we’re going to try to get it to do it again. And again. And again forever. Here’s what it looks like:
(defn many-accum [p]
(fn [state cok cerr cok eerr]
(letfn [(many-err [err] (throw (RuntimeException. "combinator '*' is applied to a parser that accepts an empty string")))
(continue [coll item state-prime]
(p state-prime (partial continue (cons item coll)) cerr many-err (fn [_] (cok (cons item coll) state-prime))))]
(p state (partial continue (seq [])) cerr many-err (fn [_] (eok [] state))))))
We define many-err
to immediately quit with an exception if the third continuation, eok
, is called since that means that p
accepts empty strings and would spin forever if we let it. The only other trick to many-accum
is that we create continue
to accumulate items by first calling it with an empty seq, (seq []) and then consing further consumed items onto the front. Haskell’s many-accum
takes a cons-like operator in addition to p
as a more flexible way of creating a list of elements.
A final Note
I intentionally stayed away from Monads in this post (which is no easy task when porting Haskell), averting my eyes from Konrad Hinsen’s clojure.contrib.monad
and trying wherever possible to make Clojure functions feel less like Haskell functions obsessed with parentheses. Not because Monads are particularly special or complex, but rather just the opposite. Monads fall out of designs that favor composability and uniformity. The first parser of this post, always
, is half of an implementation of Monad. parser-zero
and parser-plus
are 100% of an smaller class of monads called MonadPlus
. Reading clj-http’s source, I felt like it was such clean and idiomatic Clojure, with fantastic composablity properties that made it easy to build on top of, but also like it would be very easy to expess in Haskell and not feel forced or awkward. So it’ll be interesting to finish this port and see if I can succeed in doing the same in the opposite direction.
3 comments