Two Guys Arguing

Mocking frameworks are like cologne

Posted in software testing by benjaminplee on 11.20.10

Mocking frameworks are like cologne …

they help make your tests smell better, but they can cover up some really stinky code.

I was working my way through some old bookmarked blog posts when I ran across Decoupling tests with .NET 4 by Mr. Berridge and it reminded of a discussion I had a few weeks back while teaching a refactoring workshop.  The discussion centered around the question: When should I use a mocking framework and when should I use real objects or hand-coded stubs?

*** I will leave the questions of the differences between stubs & mocks and the merits of testing in true isolations versus using real objects for another day ***

Lego FriendMocking frameworks like Mockito or Moq are great tools, but lately I have been finding myself using them less and less.  I have found that the more I use mocking frameworks the more easily I break the Single Responsibility Principle. Now, this obviously isn’t the fault of the tool.  It is my fault.  And my pair’s.  The real problem is that we grew accustomed to using the tool and began to lean on it as a crutch.  As we added more features our code took on more and more responsibilities; but we didn’t feel the pain directly because Mockito was doing more and more of the heavy lifting.

In Kevin’s post, he specifically talks about TDDing an object with many dependencies and the pain of updating each test as new dependencies were added.  Now, I have a ton of respect for Kevin and his code … but I wonder if the another solution to his problem would be to re-examine why the object needed so many dependencies.  Is it doing too much?  Could these dependencies be re-described and possibly combined into the overall roles they fulfill?

Kevin’s solution was clever and a nice tip to keep in my pocket.  The devil is in the details.  As I try to “follow the pain” I am trying to write more hand-coded stubs and evaluate any time I find myself relying on side-effects to verify how my code works.  Wish me luck.

Copyright 1988

Posted in challenge by youngnh on 11.19.10

I found this in the book I’m working my way through. It’s a bit obvious, but see if you can’t tell what it does:

10   REM unknown
20   INPUT L$(1), L$(2), L$(3), L$(4)
30   PRINT
40   FOR I1=1 TO 4
50   FOR I2=1 TO 4
60   IF I2=I1 THEN 130
70   FOR I3=1 TO 4
80   IF I3=I1 THEN 120
90   IF I3=I2 THEN 120
100   LET I4=10 - (I1 + I2 + I3)
110   LPRINT L$(I1); L$(I2); L$(I3); L$(I4);
120   NEXT I3
130   NEXT I2
140   NEXT I1
150   END

The book is Umberto Eco’s Foucault’s Pendulum.

When IS the last responsible moment?

Posted in software development by benjaminplee on 11.18.10
XKCD: Responsible Behavior

XKCD: Responsible Behavior

As agile software developers we work hard to make the development process as flexible  and lean as possible for our customers.  We work hard to estimate accurately, deliver working code quickly, and only do the work needed to finish the story right in front of us.  This discipline is hard but keeps our customers in the loop and getting the most for their money/time.  In order to keep up with the customer’s every changing requirements we often push-off decisions to the last responsible moment.  By delaying making expensive decisions until our hand is forced, we reduce waste and can stay focused on delivering value with each finished story.

But, when is the last responsible moment?  Who decides?  Has it changed?

With our heads down in the trenches, knocking out stories left and right, and moving fast toward our client’s vision … we have to keep these questions in mind.  A recent project reminded me of this.  Our team did a great job at empowering our client to shift our priorities as their business understanding changed.  We offered options, gave accurate estimates for stories, and allowed them to shift the work at hand each iteration.  We did exactly what we were setting out to do.

Unfortunately, we were so focused on short-term flexibility and options that the higher level vision began to slip, and some decisions were made AFTER the last responsible moment.

The key here is the responsible.  While we work to increase our productivity, reduce waste, and keep our customer’s immediate desires met … we need to keep in mind that the short-term decisions add up.  Long off decisions and high level visions don’t need to be fully spec’ed and decided day 1, but the same way we maintain our suite of tests as requirements change, short-term decisions needed to be reconciled with the high level vision.  A great team keeps in mind when this last responsible moment is for different kinds of decisions and revisits these frequently.

</rant>

Tagged with:

Chasing the Sun

Posted in challenge by youngnh on 11.17.10

As an incomplete response to Ben’s long-ago challenge, “Long Walk After the Sun”, I found 2 websites that gave me enough information to start calculating the answer.

The first is the Naval Oceanography Portal, a veritable plethora of astronomical almanacs, calculators and errata. In particular, their Sun Altitude/Azimuth Table Calculator is particularly useful. Plug in a date and town in the United States and it will calculate for you at down to minute resolution, exactly where in the sky you will find the sun from sunup to sundown.

Using their software to generate a table of headings, you would then know exactly which direction your hypothetical man would be traveling in at any given time of day.

The next useful online calculator I found was on the Federal Communications Commission’s website, of all places. It has a calculator which will determine “Terminal Coordinates Given a Bearing and a Distance”. You plug in a starting latitude and longitude, as well as a distance to travel and your bearing, and it will spit out the latitude and longitude you’ll end up at.

From Wikipedia, humans walk at a pace of about 5km/h, or about 83 meters per minute. Given a set of starting coordinates, you could look up a heading from the Azimuth/Altitude chart, and set the distance traveled to (83 meters x the number of minutes walking) and get a new location. If you had the patience to calculate a very large number of items, you could trace the path from sunrise on January 1st, 2009 to sundown on December 31st, 2009.

Or someone could write a scraper. Or better yet, look up the algorithms these sites use to calculate these sort of things and not hammer their servers for curiosity’s sake.

I did a few calculations by hand (and by “by hand” I mean I copied and pasted numbers into web forms) and here’s the track of a person following the sun today, November 17th, 2010 from sunrise at 5:50 AM until he gets tired and passes out at 8:00 AM

Time		Latitude		Longitude		Bearing (E of N)
05:50		38 38  0 N		90 15 0  W		105.8
06:00		38 37 53 N		90 14 27 W		107.3
06:10		38 37 45 N		90 13 54 W		108.8
06:20		38 37 36 N		90 13 21 W		110.2
06:30		38 37 27 N		90 12 49 W		111.8
06:40		38 37 17 N		90 12 17 W		113.3
06:50		38 37 06 N		90 11 45 W		114.8
07:00		38 36 55 N		90 11 14 W		116.4
07:10		38 36 43 N		90 10 43 W		118
07:20		38 36 30 N		90 10 13 W		119.6
07:30		38 36 17 N		90  9 43 W		121.3
07:40		38 36 03 N		90  9 14 W		123
07:50		38 35 48 N		90  8 45 W		124.7
08:00		38 35 33 N		90  8 17 W		126.5

Something that stands out is that in the northern hemisphere, the sun sweeps an arc from something like 60 to 300 degrees E of N in a single day. That’s during the summer. Winter months the sun is lower in the sky (spends less time above the horizon, so less time walking), and sweeps a smaller arc, but one still mostly centered south. Also, the bearing of the sun changes slower around noon. I’d venture to guess that the time you’d spend walking east and west wouldn’t be as long as the time you spent walking mostly south. In a single day, a person wouldn’t cover very much ground over the surface of the earth. There are over 100 kilometers between lines of latitude on the earth, so even traveling in straight lines, it’d take close to 2 ten-hour days to drop a line of latitude. Over the course of an entire year, though, that might add up to some serious displacement.

I’m now very interested in this problem, and as a shot-in-the-blue guess, I’m going to say that our wandering man would end up somewhere on the line where the sun traces a perfect East to West path overhead, as north of it he’d be drawn south and south of it he’d be drawn north. I’m not sure that’s the equator, though since the earth is tilted on it’s axis. Ah, more problems already brewing and I haven’t even properly finished this one yet. Time to stop typing.

A few other websites I stumbled across researching this post that provide some illuminating links:
The Griffith Observatory in Los Angeles, in particular this page which turned me on to the possibility of not having to derive all the math myself because humans have been looking at the sky since the dawn of their existence and who would’ve thunk that at least one of those dirty apes thought to write down what they saw.
This website’s FAQ has a tip for a dead-tree book with Astronomical Algorithms in it.

Tagged with:

HTML 5 Game of Life – Canvas Tag

Posted in javascript by benjaminplee on 11.16.10
Screenshot of HTML5-GoL animation

Screenshot of HTML5-GoL animation

In a previous post I described how I created a simple version of John Conway’s Game of Life using HTML 5 Web Workers for multi-threaded matrix calculations: HTML5-GoL.  Once each new matrix is calculated, I needed to display it somewhere.  In order to keep rendering (and therefore UI thread overhead) at a minimum I decided to only pass back the delta from the previous matrix to the current one (those squares with new life and recently deceased).  The end result was a pretty fast implementation that could render fast enough for smooth transitions.

Canvas Tag – Simple 2D bitmap graphics

The Canvas tag was first introduced with Apple within Webkit and allows for a simple block area to render simple 2D bitmap graphics.  This differs from SVG implementations which render individual graphic primitive objects which can be re-rendered and modified as DOM elements.  Canvas tag graphics are simple drawing, composition, translation, and image manipulation.

In order to draw each Game of Life matrix, we first need the canvas area

<!DOCTYPE html>
<canvas id="gol_canvas" width="401" height="401"></canvas>

Next we need to get the 2D context from the canvas DOM element.

var context = document.getElementById('gol_canvas').getContext('2d')

Once we have the 2D context we can start drawing simple shapes (lines, blocks, circles, etc) or do any number of other 2D actions.  For this simple Game of Life example I created a new GoLCanvas object which would hold all of the GoL specific actions (e.g. create life at x,y, clear the board, etc).  Overall the drawing API is simple if not a bit archaic (reminds me of my CS course in computer graphics where we did very basic OpenGL commands in C++).

Taken from gol-canvas.js

var draw_background = function() {
context.clearRect(0, 0, size, size)
context.fillStyle = COLORS.BACKGROUND
context.fillRect(0, 0, size, size);
}

The canvas work was simple and effective; only touching the surface of what can be done. There are a ton of great tutorials on the canvas tag and what can be done with it.

symbol-macrolet and inside-out test fixtures

Posted in clojure, common lisp by youngnh on 11.15.10

Rolling right along with more ways to write fixtures, let’s say we start with a test that looks like this:

(deftest test-convoluted
  (with-connection hdb
    (try
      (do-commands "create table person (name varchar(255))")
      (insert-values :person [:name] ["bill"] ["joey"])
      (with-query-results results ["select name from person"]
    (is (= "bill" (:name (first results))))
    (is (= "joey" (:name (second results)))))
      (finally
       (do-commands "drop schema public cascade")))))

It does it’s setup and teardown inside of the deftest itself. The actual “testing” parts are the two (is) forms deeply nested inside. I’ve already written a post on how the clojure.test lib that comes with Clojure addresses this kind of complexity by providing fixtures. However, they’re a bit clumsy and aren’t particularly fine-grained. You can either run them once before (and after) all tests run, or once before and after each test runs.

The Common Lisp testing framework FiveAM has a different approach to defining and using fixtures. The FiveAM library defines two macros, def-fixture and with-fixture which define a fixture form and execute forms inside of a named fixture, respectively.

To define a fixture that takes care of the setup and teardown in the test above, we would write something like this:

(def-fixture person-schema []
  (with-connection hdb
    (try
      (do-commands "create table person (name varchar(255), age integer)")
      (insert-values :person [:name] ["bill"] ["joey"])
      (with-query-results results [query]
        (test-body))
      (finally
       (do-commands "drop schema public cascade")))))

In the above, test-body is a special captured variable that will be replaced with whatever forms you later specify. It’s where the meat of your test will go. You specify what to run there as the body of the with-fixture macro, thusly:

(deftest test-ideal
  (with-fixture person-schema []
    (is (= "bill" (:name (first results))))
    (is (= "joey" (:name (second results)))))))

The with-fixture form names the fixture we want to use, and takes care of expanding it in such a way that the variables are still checked by the compiler. If our fixture didn’t declare results in the scope that our assertions were expanded to, the compiler would complain just as if we had written the whole thing out by hand.

We can make the fixture in our example even more flexible. def-fixture can declare arguments and with-fixture can provide them. Altering our setup slightly to add a second column, we can then pass in a specific query to be run per fixture:

(def-fixture person-schema [query]
  (with-connection hdb
    (try
      (do-commands "create table person (name varchar(255), age integer)")
      (insert-values :person [:name :age] ["bill" 25] ["joey" 35])
      (with-query-results results [query]
        (test-body))
      (finally
       (do-commands "drop schema public cascade")))))

And then we can get a lot more mileage out of our solitary fixture:

(deftest test-ideal
  (testing "name column"
   (with-fixture person-schema ["select name from person"]
     (is (= "bill" (:name (first results))))
     (is (= "joey" (:name (second results))))))

  (testing "age column"
    (with-fixture person-schema ["select age from person"]
      (is (= 25 (:age (first results))))
      (is (= 35 (:age (second results)))))))

This is a trick that defmacro can’t easily perform. If we wanted to define person-schema as a macro, in order to capture results, we’d have to write put ~'results somewhere in a backtick form. It’ll work, but for any significant number of capturing symbols, there’s tildes and ticks everywhere. For a feature that would ostensibly have users writing lots and lots of their own expansions, that’s a major drawback, in my opinion. Early versions of the newly re-written ClojureQL had users do this, I believe, with little snippets of macros sprinkled throughout their code. It turns out that we can have our cake and eat it too, and you’ve probably guessed how from the title of this post.

I copped the implementation straight from FiveAM. Common Lisp has an advantage here, as the language has built-in local macros (Clojure’s are global to a namespace) and symbol macros, which FiveAM uses to great effect. However, Konrad Hinsen (of clojure.contrib.monad fame) has implemented local and symbol macros in the clojure.contrib.macro-utils lib. I used that.

My Clojure implementation consists of the two macros and a global dynamic variable that holds an atom mapping fixture names to their arguments and bodies. The def-fixture macro takes care of assoc-ing them as they are defined. The with-fixture macro pulls that fixture definition and constructs an anonymous function from it, as well as making test-body a symbol macro that expands to the given body of with-fixture.

(def *fixtures* (atom {}))

(defmacro def-fixture [name args & body]
  `(swap! *fixtures* assoc '~name (cons '~args '~body)))

(defmacro with-fixture [name args & body]
  (let [[largs lbody] (get @*fixtures* name)]
    `(symbol-macrolet [~'test-body (fn [] ~@body)]
                  ((fn ~largs ~lbody)
                   ~@args))))

I could have used a local macro, by swapping the symbol-macrolet form for a macrolet form, but then I would have had to quote the body parameter passed to with-fixture. By using a symbol macro and asking that users treat it like a fn, I can avoid that. It’s a small thing and either way works. For 8 lines of code overall, these 2 macros add a lot of flexibility to how you can define test fixtures.

Tagged with: , ,

Tie Git Submodules to a Particular Commit or Branch

Posted in git, software development by benjaminplee on 11.14.10

While working on getting QUnit-CLI cleaned up and refactored a bit, I realized I needed to tie the example code in the Git repository to a particular version of QUnit.js (those guys are making changes too fast for me to keep up).  I have used SVN:externals prevsiously so Git submodules seemed like an obvious solution.  A single submodule should allow me to keep QUnit-CLI inherently pointing to a particular revision of QUnit.js without requiring me to seperately document which version I was testing against.

The man page for git-submodule as well as the Git Book chapter on Submodules do a good job of documenting the command with some simple examples, but none that were 100% clear for my needs.  In addition, I need my Git submodule to point to a specific commit (or branch) so that everyone cloning my code consistently can run my examples w/o fear that a new commit on HEAD will break something.

Step 1 : Add the submodule

Once the module is checked out, I need to add the QUnit submodule.  First grab the GitHub url for my QUnit fork (eventually this will be replaced with the main QUnit repo) and execute the “add” command from within your local repository root.

git submodule add git://github.com/asynchrony/qunit.git qunit

Afterward there will be two modified and staged objects in your repo: .gitmodules will contain the submodule’s local path and source URL and a new folder named qunit which contains a full clone of your source repository.

** Fraser Speirs has a good writeup on what is going on behind the scenes with the Git internals and how the key to all of this is in the index files of each repo and the modes the changes are committed with. **

Step 2 : Fix the submodule to a particular commit

By default the new submodule will be tracking HEAD of the master branch but will NOT be updated as you update your primary repo.  In order to change the submodule to track a particular commit or different branch change directory to the submodule folder and switch branches just like you would in a normal repo.

git checkout -b dev_branch origin/dev_branch

Now the submodule is fixed on the development branch instead of HEAD of master.  Just easily I could set it to specific commit or tag.

Step 3 : Commit everything

Now from your primary repository you still have two modified objects: .gitmodules file and qunit folder.  Commiting these changes will persist the new submodule tracking your desired branch.

Step 4 : Clone Recursive

The next time you (or someone else) clones this repo, they will need to do one of two things.

A) Add the –recursive flag to their git clone command

git clone REPO_URL --recursive

B) manually initialize and the submodules after the clone

git clone REPO_URL
git submodule update --init --recursive
Tagged with: , ,

Fixture Macros

Posted in clojure by youngnh on 11.13.10

A few months ago, at Revelytix, we put together a pretty large test-harness at work for putting Semantic Triplestores through their paces. The effort was highly dependant on setup and teardown of the external environment.

For instance, in order to run a single test that involves making a http request to a triplestore-backed web application, I had to write code to start the virtual machine my web application and triplestore images were installed on, restore that vm to a known-state snapshot, wait for the guest operating system to come online, login to the web application and establish a session, and oh yeah, actually make the http request to see the results I was really interested in.

Ultimately, we went with a design that passed a lot of maps around, but tonight I found myself reviewing some of the other possibilities we could have pursued. Clojure maintains great separation of concerns by allowing you to pass functions around. For example, here’s the function I originally wrote to setup and teardown a virtual machine, running some arbitrary function f in-between:

(defn vm-setup [vm snaphsot f]
  (restore-snapshot vm snapshot)
  (start-vm vm)
  (try
   (f)
   (catch Exception e
     (.printStackTrace e))
   (finally
    (save-vm-state vm))))

The function knows nothing about what f does and so f can be anything.

After the call to start-vm, VirtualBox launches immediately, but it often takes 10-15 seconds for the guest OS to start running and communicating. Executing f immediately after starting the vm will often fail if you don’t first wait for the guest OS to come online. I could write that sort of functionality into the vm-setup function, but that’s mixing concerns, and I could just as easily write another setup function that tries to ping the guest OS, executing it’s payload once the machine has started communicating:

(defn host-reachable? [host timeout f]
  (if (ping host timeout)
    (f)
    (throw (Exception. (str "Could not reach host: " host " within " timeout " ms")))))

This pattern goes on. Conceptually, I ended up with a single test being expressed like this:

(vm-setup "WebAppVM1" "CleanSnapshot"
  (host-reachable? "hostname" (* 30 1000)
    (webapp-login "hostname" "username" "password"
      (times 30
        (timethis
          (http-request "/some/webapp/path"))))))

Which is pretty readable and it’s structure matches it’s meaning. The test itself is the call to

(http-request "/some/webapp/path")

and it’s nested pretty deeply in an execution context.

If we needed to resuse all of those steps, we could make it into a composite setup function:

(defn start-vm-and-login [f]
  (vm-setup "WebAppVM1" "CleanSnapshot"
    (host-reachable? "hostname" (* 30 1000)
      (webapp-login "hostname" "username" "password"
        (times 30
          (timethis
            (f))))))

The code above, however, won’t run. The forms aren’t function objects, they’re function invocations that return values. We’d have to wrap each of them in (fn [] ) in order to lambda-ize them:

(vm-setup "WebAppVM1" "CleanSnapshot"
  (fn []
    (host-reachable? "hostname" (* 30 1000)
      (fn []
        (webapp-login "hostname" "username" "password"
          (fn []
            (times 30
              (fn []
                (timethis
                  (fn [] (http-request "/some/webapp/path")))))))))))

We can write our own macro to take the first version and expand into the second, and it helps if we notice how similar it is to the -> macro. In fact, we could write:

(->> (http-request "/some/webapp/path")
     (fn [])
     (timethis)
     (fn [])
     (times 30)
     (fn [])
     (webapp-login "hostname" "username" "password")
     (fn [])
     (host-reachable? "hostname" (* 30 1000))
     (fn [])
     (vm-setup "WebAppVM1" "CleanSnapshot"))

Which reads a little bit backwards, but provides a great target for our macro to generate:

(defmacro fixture [& forms]
  `(->> ~@(interpose '(fn []) (reverse forms))))

Allowing us to write:

(fixture (vm-setup "WebAppVM1" "CleanSnapshot")
         (host-reachable? "hostname" (* 30 1000))
     (webapp-login "hostname" "username" "password")
     (times 30)
     (timethis)
     (http-request "/some/webapp/path"))

Not bad

Tagged with: , , , ,

Notes to self: Keep SVN in sync w/ Git The Poor Man’s Way

Posted in Uncategorized by benjaminplee on 11.12.10

There is probably a much better/faster way to do this leveraging git-svn, but for the poor fool who has to keep an SVN repo in sync w/ the development Git repo for client consumption … here are a couple helpful one liners…

Copy from the Git repo directory to the SVN repo directory

rsync -r $GIT_DIR $SVN_DIR

Detect if there are any non-.svn folder differences between the directories after (good sanity check)

diff -r $GIT_DIR $SVN_DIR | grep -v '\.svn' | wc

Porting Haskell’s Parsec

Posted in clojure, haskell by youngnh on 11.11.10

Parsec is a parser-combinator library. Parser combinators are built around the idea of making a bunch of very small and focused parsers and combining them using operators that one would more usually see in regular expressions. Ultimately leading to parsers that feel more like function calls in a program that a stiff declaration of a grammar. Parsec is king in Haskell-land. In Clojure, however, there are a number of libraries fully- and not-so-fully written that can be used to write parsing programs. fnparse, Amotoen, clarsec, parser, clj-peg are just a few, feel free to mention your favorites in the comments. I don’t mean to leave any out, but rather point to out that what I’m doing here is not new. I do hope it’s illuminating for some.

Parsec, as I see it, boils down to 2 ideas.
A parser either consumes input or doesn’t. Consumed or Empty.
A parser either succeeds in parsing or it fails. Ok or Err.

These outcomes can be combined into 4 continuation functions that are passed to every parser:

  • cok – Consumed & Ok
  • cerr – Consumed & Err
  • eok – Empty & Ok
  • eerr – Empty & Err

As for errors, Parsec defines two types of them. Those that we can say something about, and those that we can say nothing about. These are errors with messages and unknown errors, respectively. Of the errors that we can say something about, some are the result of not finding input that the parser was expecting, which lead to messages like “expected ‘a’ and found ‘b'”, and some are the result of not finding input where we expected to, which lead to messages like “unexpected end of input”.

Finally, Parsec keeps tabs on the thing it’s parsing, it maintains state. The state is made up of 2 elements, the input stream itself, of which a Clojure seq models nicely and the current source position, itself made up of the name of the input, and one’s current line and column location in it.

The Most Basic Parsers

The simplest parser is the one that no matter what, returns a constant value. This is called parserReturn in Haskell, but in Clojure, it’s more akin to the constantly function, so I’ve named it always, and here’s it’s simplified implementation:

(defn always [x]
  (fn [state cok cerr eok eerr]
    (eok x state)))

This implementation makes sense. No matter what, it returns a new parser. A parser is merely a fn that takes a state and 4 continuations. The always parser always calls the Empty & Ok continuation. Nothing was removed from the stream (hence the Empty part), and everything should continue on as normal (the Ok part).

Equally simple is the parser that always fails. This is called parserZero in Haskell, since it represents a “nothing” parser.

(defn parser-zero []
  (fn [state cok cerr eok eerr]
    (eerr (unknown-error state))))

(defn unknown-error [{:keys [pos] :as state}]
  (ParseError. pos []))

More Interesting Parsers

One of the more basic parsers in Parsec is tokenPrim, which processes a single element from the underlying stream. It unconses the first element from the head of the input, tests if it is supposed to be consumed and then updates the state’s current position in the input. To do this, it takes 3 functions.

nextpos calculates a new source position based on the item consumed and the old position.
test takes a single element from the underlying stream and returns whether or not to consume it
showToken is used to create readable error messages by returning a string representation of stream elements

(defn token-prim [show-f nextpos-f consume?]
  (fn [{:keys [input pos] :as state} cok cerr eok eerr]
    (if-let [s (seq input)]
      (let [item (first s)
            rest-of-input (next s)]
        (if (consume? item)
          (let [newpos (nextpos-f pos item rest-of-input)
                newstate (InputState. rest-of-input newpos)]
            (cok item newstate))
          (eerr (unexpect-error (show-f item) pos))))
      (eerr (unexpect-error "" pos)))))

There are three ways the above function continues. Two are through eerr, one when there is nothing left in the seq when we were expecting to parse something, and one when we did parse something, but our test told us not to consume it. In the second case we can produce a decently readable description of the item so that we can later present it to the user. Finally, if our test tells us to go ahead and consume the item, we call cok passing it the item and a newly calculated state with a new position and the input without our consumed item on the front.

There’s a lot of parsers we can implement on top of token-prim, however, it’s got no brain. You can only line up a number of token parsers one after another and let them tell you if the input matched in the order you thought it would. We can’t express the idea of “or” with it. For that, Parsec relies on the parserPlus parser. It’s called “plus” because it’s used to glue multiple parsers into a single one, analagous to how addition of numbers glues them all together into a new, single number (I never used to think about things like this. Haskell has made me re-understand everything I already knew).

The strategy for implementing parserPlus is that it will take 2 parsers and try the first one. If that succeeds, we’ll go with that. If it doesn’t, we try the second one, and if it succeeds we want our combined parser to be indistinguishable from that second parser. If neither work, then our parser didn’t work and we want to escape like any other parser would if it failed. Calling the first parser is easy. For the sake of staying close to the original Haskell, we’ll call this parser m. Parsers in Haskell and Clojure are simply functions, so in order to try it, we can invoke it and pass the current state and the 4 continuations it expects.

The continuations are our hook to intercept failures. We know that if m fails, it will call the fourth continuation we pass it. So to try the second parser, n second we’re going to wrap the eerr function (the 4th continuation) by trying that second parser before giving up and calling eerr. Here’s how it looks in Clojure:

(defn parser-plus [m n]
  (fn [state cok cerr eok eerr]
    (letfn [(meerr [err]
               (letfn [(neok [item state-prime]
                          (eok item state-prime))
                        (neerr [err-prime]
                          (eerr (merge-error err err-prime)))]
                 (n state cok cerr neok neerr)))]
      (m state cok cerr eok meerr))))

The loacally nested functions aren’t exactly readable at a glance, but combined with the knowledge of what’s happening it’s a really elegant way to express the idea. Also, as a small note, there aren’t great names for some of the nested function parameters. state-prime and err-prime? Well, that’s a holdover from Haskell to express that the thing is an altered version of the thing it came from. In mathematics, this is expressed as a tick, state' and err'. Those aren’t legal Clojure 1.2 identifiers, so I opted to be verbose. Starting with the Clojure 1.3 alphas available now, tick is a legal constituent character, which means you can use it anywhere in an identifier except as the first character.

The last parser I’d like to tackle in this blog post is manyAccum. This parser wraps behavior around an existing parser and so becomes a tangle of continuation functions just like parser-plus was, but unlike parser-plus, manyAccum only accepts one parser and attempts to apply it 0 or more times. This is the Parser equivalent of the Kleene operator.

Just like parser-plus, we’re going to invoke the parser manyAccum is given and create a new parser by manipulating the continuations we pass to it. Specifically, if the parser we’re given fails to consume any input (calls eerr), we’re going to hijack that and report that it was instead an eok with an empty list. If the parser succeeds in consuming input, we’re going to try to get it to do it again. And again. And again forever. Here’s what it looks like:

(defn many-accum [p]
  (fn [state cok cerr cok eerr]
    (letfn [(many-err [err] (throw (RuntimeException. "combinator '*' is applied to a parser that accepts an empty string")))
             (continue [coll item state-prime]
               (p state-prime (partial continue (cons item coll)) cerr many-err (fn [_] (cok (cons item coll) state-prime))))]
      (p state (partial continue (seq [])) cerr many-err (fn [_] (eok [] state))))))

We define many-err to immediately quit with an exception if the third continuation, eok, is called since that means that p accepts empty strings and would spin forever if we let it. The only other trick to many-accum is that we create continue to accumulate items by first calling it with an empty seq, (seq []) and then consing further consumed items onto the front. Haskell’s many-accum takes a cons-like operator in addition to p as a more flexible way of creating a list of elements.

A final Note

I intentionally stayed away from Monads in this post (which is no easy task when porting Haskell), averting my eyes from Konrad Hinsen’s clojure.contrib.monad and trying wherever possible to make Clojure functions feel less like Haskell functions obsessed with parentheses. Not because Monads are particularly special or complex, but rather just the opposite. Monads fall out of designs that favor composability and uniformity. The first parser of this post, always, is half of an implementation of Monad. parser-zero and parser-plus are 100% of an smaller class of monads called MonadPlus. Reading clj-http’s source, I felt like it was such clean and idiomatic Clojure, with fantastic composablity properties that made it easy to build on top of, but also like it would be very easy to expess in Haskell and not feel forced or awkward. So it’ll be interesting to finish this port and see if I can succeed in doing the same in the opposite direction.

Tagged with: