of stats and scraping
This all started when I wanted to share some stats I had been calculating privately with the rest of my fantasy baseball league.
Last season I wrote a handful of emacs lisp functions to scrape Yahoo!’s fantasy baseball site with the intention of having a status update on my team every time I opened an emacs session to do my days work (The faithful among us will note that I am far out of my depth and Doing It Wrong if I actually close my emacs sessions at all, much less doing so every morning. Duly noted). So I had the raw numbers within my reach.
This was a bit of a problem as all of my fantasy baseball code was written in emacs lisp. Emacs can invoke programs from your system and redirect their output to one of its buffers. My code leaned heavily on this feature to invoke wget, using it to handle sessions and cookies while I slurped up pages from our league’s site, and then used emacs’ extensive buffer search facilities to ferret out and collect the data I was interested in.
In order for the web site to be useful at all, it would need to be updated daily (something that Yahoo! itself has a spotty track record of). It would have been possible to cron a task that started emacs and executed my emacs lisp scripts, but, again, those faithful among us would be wincing in pain at the thought of opening and closing emacs so often. Its not an elegant solution.
I settled on using Steel Bank Common Lisp to solve my problems. It adds threads and function scheduling to the Common Lisp toolkit. I decided I would start an SBCL image and keep it running for the duration of the season. I would schedule a function to run every morning that would scrape stats and then insert those stats into a CouchDB database. It would work beautifully, harmoniously. The faithful would be pleased. Problem was, I had never used Common Lisp like this before. My exposure was limited to starting up an image, hacking around until I produced some string or file or number useful to me, and then I shut it down and went on my way. I had no right to expect that I could keep a SBCL session running for 182 games during the summer.
If I did not have a touch of mental illness — a little miswiring in my wetware subroutines — I might have let that thought stop me, and my next few posts would be about a wonderfully prosaic, conventional and dull system written in Java. It won’t be.