Two Guys Arguing

Mountains of Obfuscation

Posted in challenge by benjaminplee on 02.26.12

What would you do?

You have a couple hundred files across several nested folders that contains within them 10000+ obfuscated random and unique user names.  You also have a translation file of all of the obfuscated names and new ones you ned to have them replaced with.  Some files have all 10000+ obfuscated names in them.  Some have none or just a couple.

How would you update all of the files?

Obviously this isn’t a crazy hard problem; there are many ways to solve it.  I know how I solved it.  I am curious as how YOU would solve it.

Go!

 

[edit: added clarification]

Competing Parsers

Posted in clojure, haskell by youngnh on 09.28.11

So, I’m writing a parser for the Turtle serialization format for RDF. In addition to being a format we use all the time at Revelytix, its a decently compact grammar, giving me a good chance to implement it using The Parsatron and suss out some of the library’s rough edges.

I hit my first rough edge with the longString production:

longString ::= """" lcharacter* """

but, having already implemented the lcharacter parser already, I didn’t see the subtleties in this production and plowed ahead with this straightforward definition:

(defparser long-string []
  (between (times 3 (char \")) (times 3 (char \"))
           (many (lcharacter))))

Which looks great and compiles without complaint, but when you feed it input, it immediately complains:

> (run (long-string) "\"\"\"roughshod\"\"\"")

Unexpected end of input at line: 1 column: 16
[Thrown class java.lang.RuntimeException]

The message here could be better, and I’ll work on that. I would want it to say Unexpected end of input, expected '"""', because what happened was the (many (lcharacter)) parser consumed too much.

Turns out, lcharacter is defined in the grammar to include double quotes, so (many (lcharacter)) ate as many as it could until it literally ran out of input.

A good regex can handle this:

> (re-find #"\"\"\".*\"\"\"" "\"\"\"roughshod\"\"\"")
"\"\"\"roughshod\"\"\""

So we should be able to as well. To keep track of whether or not we’ve consumed a hat trick of quotes, my first attempt looked something like this:

(defparser long-string []
  (letfn [(middle-part [s]
            (let->> [c (lcharacter)]
              (if (= c \")
                (two-left s)
                (middle-part (concat s [c])))))
          (two-left [s]
            (let->> [c (lcharacter)]
              (if (= c \")
                (one-left s)
                (middle-part (concat s [\" c])))))
          (one-left [s]
            (let->> [c (lcharacter)]
              (if (= c \")
                (always s)
                (middle-part (concat s [\" \" c])))))]
    (>> (times 3 (char \"))
        (middle-part []))))

Which uses 3 local, mutually-recursive functions to “count” each consecutive double quote. And they all look a lot alike. I refactored to this:

(defparser long-string []
  (letfn [(middle-part [s n]
            (let->> [c (lcharacter)]
              (if (= c \")
                (case n
                      0 (middle-part s 1)
                      1 (middle-part s 2)
                      2 (always s))
                (middle-part (concat s (repeat n \") [c]) 0))))]
    (>> (times 3 (char \"))
        (middle-part [] 0))))

The above works well, but the problem arises in the first place because lcharacter and """ share the same single-character lookahead. By examining only the next character in the input, we can’t
tell if it should belong to lcharacter or """. This suggests that we can lookahead 3 characters at a time and if we receive """, then we can interpret that not as 3 lcharacters, but as a terminating triple double quote.

(defparser long-string []
  (between (times 3 (char \")) (times 3 (char \"))
           (many
            (let->> [cs (lookahead (times 3 (lcharacter)))]
              (if (= cs [\" \" \"])
                (never)
                (lcharacter))))))

I’m not sure quite which way to go, nor can I immediately see a way to make a higher-level lookahead parser that ensures that 2 parsers don’t stomp all over each other, though that would be quite ideal. If you can, chime in below in the comments.

If you’d like to follow the development of The Parsatron, it’s on github

Tagged with:

Pro Tips: Get Your Windows MSI Installer Logs

Posted in .net, notes_to_self by benjaminplee on 07.25.11

Want more information about what an .msi is doing on your system or what caused a generic and completely useless error message to pop up?  Run the following command substituting the path to your .msi from the command line and the output file will contain verbose logs.

msiexec /i filename.msi /l*v log.txt
Tagged with: , , , , ,

Pro Tips: Run Multiple Jenkins CI Servers on a Single Machine

Posted in software development, software testing by benjaminplee on 07.24.11

Jenkins (or Hudson if you work for Oracle) is a great, simple and steady continuous integration and build server.  One of its greatest features is that despite being packaged as a standard .war ready to be dropped into your JEE web container of choice … it also contains an embedded web container that makes the .war everything you need in most situations.  Simply run

java -jar jenkins.war

and up starts Jenkins on port 8080 in all of its glory.  All of Jenkins’ configuration files, plugins, and working directories go under <USER_HOME>/.jenkins by default.  Perfect.

Except if you want to run multiple instances on the same machine.  We are going to get a config folder collision.  It turns out this is a piece of cake, but the docs are hard to find.  Jenkins will use a JENKINS_HOME path environement variable for its configuration files if one is set so a simple change means we can run Jenkins out of any directory we desire:

java -DJENKINS_HOME=/path/to/configs -jar jenkins.war

What about the port you say?  Winston, the embedded web container used, has a simple property for this too:

java -DJENKINS_HOME=/path/to/configs -jar jenkins.war --httpPort=9090

Pro Tips: Color Your Tail With Perl

Posted in java, linux by benjaminplee on 03.22.11

We ran into a situation today where our JEE server was logging everything twice.  Sadly, this meant that errors and warnings were logged as INFO and were missed.

2011-03-22 11:40:40,884 INFO  [STDOUT] (main) 11:40:40,884 WARN [ProcessRoles] Could NOT find role XXXX

Since we watched the logs most often using the tail command, the quick and dirty solution was to have tail output lines containing WARN/ERROR/SEVERE in different colors that would stand out as the scrolled past.  The bash/perl script below is the fruit of my hacking.  Suggestions for improvements welcome.

t() {
tail -100f $1 | perl -pe 's/^.*SEVERE.*$/\e[1;37;45m$&\e[0m/g' | perl -pe 's/^.*ERROR.*$/\e[1;37;41m$&\e[0m/g' | perl -pe 's/^.*WARN.*$/\e[1;33;40m$&\e[0m/g'
}
Tagged with: , , , ,

guarding an expression

Posted in clojure by youngnh on 02.01.11

Oftentimes, I want to make some computation, apply the result to a predicate, and if it passes, return that result. If the predicate does not succeed, I usually want to return some other value. I’ve run into this situation before, with no satisfying resolution. My code usually ends up looking like so:

(let [value (some-computation x y z)]
  (if (check? value)
    value
    some-other-default-value))

Which is overly verbose, even for Clojure, if you ask me. There’s a let and value appears 3 times for a fairly straightforward idiom. I think what I’d like to write instead is:

(guard check?
  (some-computation x y z)
  some-other-default-value)

The following macro does the trick of expanding to the verbose form I’ve been writing:

(defmacro guard [pred then else]
  `(let [x# ~then]
     (if (~pred x#)
       x#
       ~else)))

Now, the awkward thing about the above is that unlike an if-statement, the order in which you read things is not the order in which they get executed in. Reading it, the execution happens on line 2 first, line 1 second, and then possibly on line 3. My question to the blogosphere is, does Clojure already have something like this lurking in a lib somewhere? Or is there a blindingly obvious solution to this that I’m overlooking?

2011.tech_resolutions.pop(5.questions)

Posted in questions by youngnh on 01.06.11
  1. What are you currently hacking on?

    Just before Christmas, I put together a small LL(1) parser-generator in Clojure. LL(1) has the distinction of being one of the simplest classes of grammars to parse, and a great number of textbooks on the subject will tell you that it’s often quicker to write a parser by hand for a LL(1) grammar than to go to the trouble of using a parser-generator. I think my code removes most of those objections. However, it currently suffers from 4 problems that I vowed I would remedy before it saw the light of day: its very slow for only marginally deep parse trees, it has no tests, no documentation, and the error messages it spews forth are currently only meaningful to myself. None of these problems are sexy, nor could the process of solving them be construed as “hacking”, so my motivation to finish the project has plummeted.

  2. What do you want to hack on more in 2011?

    I have no shortage of ideas that could be expressed via computational algorithms, but recently I’ve had a harder and harder time convincing myself that any of them are significant in any way other than the challenge in realizing them. This year I’d like to stay away from the “because it’s there” projects. I’d like to hack on things that:

    • someone would pay money for: a fart app, for example, or possibly one that does facial recognition so that if you think you’ve seen somebody famous on the street or simply can’t remember the name of the person you’re currently talking to, you can pull out your phone and shove its tiny camera in their face
    • something that people I’ve never met before would find useful: most of my keystrokes are deleted, a select few are saved to files, fewer still find themselves a part of a working project, and of those projects that people other than myself get their hands on, their purpose could generously be described as niche. I write software largely for the compulsive pleasure of seeing an ephemeral thought produce flashing pixels on a screen, but I’d like to try writing something that might be useful to others at the same time.
    • change an old way of doing things: Starbucks changed the way you order drinks, parenting at dinner is a contentious process, and the airline industry has been sticking with a sub-optimal solution to one of the hardest problems in all of computation but that doesn’t mean a few guys didn’t throw a bunch of servers at the problem and give it a go of their own (and crashed and burned). These are hacks in the best sense of the word.
    • change how non-programmers interact with computers: it pains me to watch people use software. You might already be rolling your eyes in preparation for a rant from a programmer about non-technical people’s poor computer skills, but that’s not it. What I see when my mother-in-law or grandfather use their computers is the incredibly resourceful, thoughtful and clever way in which they approach an inanimate, ineffable and often arbitrary hunk of plastic and silicon, and manage to make it do unbelievably complex things. I’ve read the manuals, I spent four years in college and all of my professional career learning how these things are put together and I can barely get my wireless printer to work. The only difference between us is that I have the ability to change the machine to match their understanding of it. Helping non-programmers is my peace corps, my doctors without borders.
  3. What do you want to stop hacking on in 2011?

    I think what I wrote above covered most of what I’d like to hack on, and being a very good dichotomist, I’d like to stop hacking on anything that isn’t a Big Idea. This is not to confuse Big Ideas with small hacks. Big projects never get finished. Short sprints of interesting and valuable concepts are strongly encouraged, not the least for morale. Anything that someone might ask “Why that?” of, and that I would respond with “well, because it just seemed neat…” should be viewed with strong suspicion in 2011.

  4. Big idea that just needs a free weekend to come to life?

    As you wrote big idea with a little “i”, and I’ve just finished pontificating on big “I” ideas, I’m going to say a Java compiler that does type inference. I haven’t done my homework and checked if something like this already exists, so if it does, please point me in its direction. Aside from the massive amount of typing (lol) it’d save, why should Scala get to have all the fun? A slightly more powerful compiler might lead to a better, if sloppier, place for the Java masses.

  5. How can we get more arguing on this blog?

    We could go find that youtube video of two kids having an argument and put it up here, thereby reclaiming the rest of the hits from Google searches for “two guys arguing”.

    If you’d like something software-related, pick the “social sciences” of programming, like methodologies, or productivity enhancements or what editor to use, as they often don’t have verifiable outcomes and thus can generate argument indefinitely.

    Personally, I’ve found that being wrong about a thing leads to more comments and discussion than solving that thing competently. On a positive note, I’d like to think that it has been our extreme competence in what we do that has kept this blog from rocketing us into the stratosphere of computer scientists around the world.

A Solution from Purgatory (for the Matrix from Hell)

Posted in challenge, math, questions by benjaminplee on 01.02.11

This post attempts to recount some of my thought process which eventually (more than a year later) lead to the Matrix from Hell puzzle I posted about a few days ago.  If you don’t want to know my answer, don’t read the rest of this post.  My final solution is at the bottom.

You have been warned.

~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~

Below is a list of my general thought process and the main ideas/revelations/halucinations that I know helped me to my solution.  An annotated matrix is at the bottom with some of the ideas highlighted in different colors.

  • Should indices start at 0 or 1?
  • Values along either axis count up by 1 in both directions when the other =0
  • Values are duplicated over the f(x)=y axis –> Order of indices given to function is inmaterial since they can be swapped
    [The RED 5 can be seen duplicated over the diagonal]
  • Calculation of max/min index is O(1) so stop worry about it and “forget” the bottom half of the matrix (assume X>= Y)
  • Values generally increase as they go right/down but not always
  • f(x,x)=0, f(x,0)=x
  • Values are symmetrical perpendicular to f(x)=y IFF the bounding square in question is a power of 2
  • “Quad” patterns can be seen repeating for every power of 2.  In each “Quad” the top left quadrant is the same as the bottom right and the top right is the same as the bottom left.
    [Each "Quad" of the example matrix is outlined in BLUE with the quadrants in LIGHT BLUE. PINK shows the repetition]
  • power of 2 repeating nature of the values leads to thinking of binary trees
  • binary trees leads to thinking about a possible solution which could run in O(log2 n) time:
    • In a nut shell, for a given (X,Y), algorithm would find smallest boudning power of 2 including X and would walk back “Quad”s by each successivly smaller power of 2 “Quad” until the last one is met.  For each “Quad” determine which quadrant the value lives in and remap to root values.
    • Debate if this would really work for large numbers and if it could be reduced to constant time function by math that I have forgotton from college/HS
    • Dismiss b/c I can’t remember a way and base idea is > O(1)
  • Binary trace back algorithm idea makes me realize that it can be also looked at as the “binary” representation of a number (e.g. 01011011010) where each 1 represents a “Quad” where the value was in the upper right or bottom left and vica-verca.
    [Following the matrix value at (5,6) can be seen ORANGE.  The value 3 is in the bottom right  of the 1st "Quad", then BL, & TR. {BR BL TR} => {101} => {3}]
  • Remembered that f(x,y)=0 IFF x==y …. which means that difference between X and Y is important to value of f()
  • Eureka!
    • F(X, Y) = X xor Y
    • solution typed above in white on white to protect the innocent. highlight to view.

sample matrix with some of my thought process drawn out

That is it.  I never said it was pretty or made any sense.

Tagged with: , , , ,

2011.tech_resolutions.push(5.questions)

Posted in questions by benjaminplee on 01.01.11

In the spirit of the first posts to this blog and the new year I poste the following 5 questions to Mr Young…

  1. What are you currently hacking on?

  2. What do you want to hack on more in 2011?

  3. What do you want to stop hacking on in 2011?

  4. Big idea that just needs a free weekend to come to life?

  5. How can we get more arguing on this blog?

Tagged with: ,

C# Meta-Programming And Extension Methods Question

Posted in .net, functional programming, questions by benjaminplee on 12.30.10

First, it should be noted that my ASP.Net MVC-foo is yet great and I know this.  If you know a better way to do this, please take the time to explain how you are superior.  If you don’t, how will we ever know?

Working with ASP.Net MVC 2 client-side form validation this past week, I ran into two unfortunate (IMHO) facts which kept tripping me up:

  1. Html.ClientValidationEnabled() must be called as a scriptlet BEFORE the form is created
  2. Html.ValidateFor or Html.ValidationMessageFor must be called for EACH property of the model you want validated.

*** This StackOverflow answer includes a couple other important & useful things to note

These two issues (and many more) bother me about ASP.Net MVC 2 client-side validation but #2 in particular was really ticking me off.  I like that I can conditionally turn validation on/off on a field by field basis … but why make me enumerate all validation-needed model fields twice in my view!?  Especially in my case where I want validation turned on for ALL model object fields that have DataAnnotation Validations defined for.  Obviously the fields are going to be enumerated in the view once; how else are the text boxes and drop downs going to be created, but why make me list out each field AGAIN just to turn on validation?

All of this amounts to one WET solution. Am I missing something?

The best solution I have seen, and the one I am about to start implementing, is this one by Rick Anderson over at MSDN found via this related forum post.  The idea is to create additional Html helper extension methods which will combine the creation of the HTML form element with turning on and configuring validation.  This solution has the benefit of making me enumerate the model fields being used only ONCE in my view code and reads cleaner.

The only problem is that I have to walk through our application, find anywhere that is using a built in HtmlHelper extension method to build an HTML form element (e.g. input tag), build a new validating wrapper extension method that matches the signature of the original, and then replace the original code with the new extension method.  This can be done, but is time consuming and error prone.  And I am lazy.

The real issue …

I want to … create additional HtmlHelper extension methods based on all existing ones that produce form elements and take in a Linq expression to find the model property that first pass that expression to helper.ValidateFor() and then return the original extension method’s valued.

Is there any meta-programming feature or technique in C# that will allow me to concisely do that?

e.g.

Since Html.TextBoxFor(user => user.FirstName) is used in our app, as it stands I will am going to create a custom extension method similar to:

public static class ValidatingFormElementsExtension
{
  public static MvcHtmlString ValidatedTextBoxFor<TModel, TProperty>(this HtmlHelper<TModel> helper, Expression<Func<TModel, TProperty>> expression)
  {
    helper.ValidateFor(expression);
    return helper.TextBoxFor(expression);
  }

  // and so on for EACH one we use

}

Anyone have any ideas?  Am I completely off base with this? Is there a better way?

~~~~~~~~~~

<mini-related-rant>

A few more quick cracks at ASP.Net MVC  that I need to get off my chest

  • Why are so many things done in scriptlets!?  I feel like I am back in the vanilla JSP days of yesteryear.
  • They are Annotations in Java and Attributes in C# …. so why are the data validation attributes defined in System.ComponentModel.DataAnnotations?
  • I get that HTML checkbox form input fields have TRUE/FALSE values passed back when the form is submitted, and that doesn’t map perfectly to how other fields are handled by the Required attribute …. but why was NOTHING included to do this?  Having to use a RegularExpression validator for [Tt]rue is clumsy no?

</mini-related-rant>

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: