Two Guys Arguing

One fn to bind them

Posted in clojure, haskell by youngnh on 11.21.10

I had a chance to work on my parsec port a little this weekend. Say hello to one of the most important and ubiquitous parsers in the parsec arsenal, parser-bind.

The idea behind parser-bind is that it should squish two parsers together. It represents parsing one thing after another. The only other parser we’ve built that squishes two parsers together is parser-plus, which operates more like “or” in that if the first one fails, it tries the second. This parser will quit immediately whenever either fails. If this parser succeeds, it’s because each matched successive input.

(defn parser-bind [m n]
  (fn [state cok cerr eok eerr]
    (letfn [(mcok [item state]
              (n state cok cerr cok cerr)))
            (meok [item state]
              (n state cok cerr eok eerr)))]
      (m state mcok cerr meok eerr))))

If the first parser, m, consumes ok, but the second one, n, does not consume, our combined parser will still call the cok continuation. Conversely, if the first one is empty and ok, but the second one consumes, we will also escape via the cok continuation. parser-bind does not override any of the error handling continuations because if something goes wrong, we use them to exit immediately.

The useful part of parser-bind isn’t in the the above implementation. It isn’t how parsec implements the idea. Parsec’s implementation does take the first parser, m, but for it’s second argument, it takes a function that, when executed, returns the second parser.

This is a neat idea because the unlike a parser that has to be fully specified at write-time, a function can bind intermediate, runtime results. Those intermediate results, once bound and named can be used to create further parsers. It allows us to write let-like forms:

(p-let [c (one-of "abc")]
  (char c))

Where each binding form in the parser let has to be a destructuring form and parser pair. The above is a parser that parses a character, and then looks for a duplicate of what it just parsed, similar to capture groups in regular expressions. p-let uses parser-bind under the covers:

(defmacro p-let [[& bindings] & body]
  (let [[bind-form p] (take 2 bindings)]
    (if (= 2 (count bindings))
      `(parser-bind ~p (fn [~bind-form] ~@body))
      `(parser-bind ~p (fn [~bind-form] (p-let ~(drop 2 bindings) ~@body))))))

Given only a single binding pair, we make the parser in it the first argument to parser-bind, and wrap a function with it’s destructing form as args, returning the body. In longer binding forms, we produce a recursive structure that macroexpand will continue to expand one binding form at a time.

Tagged with: