Two Guys Arguing

error: object file is empty

Posted in git by youngnh on 10.14.13

A coworker could not push a branch of his development to our company Stash server this afternoon. He tried once in the morning, but the server went down, so he stayed late until the server was online again and then tried again to push his commits, only to be hit with a very cryptic error.

I stayed late debugging it with him, and here’s what we found.

When he tried to push his branch, he got an error saying that an object file is empty and that the repo is corrupt. We figured that was that, and spent a lot of time exporting his patches, and reimporting them into a freshly cloned repo. When we got his commit history back into shape, we hit `git push` again expecting smooth sailing, but got the same error message:

error: object file .git/objects/20/8529317841b2f25e1213f7e4f6ed3a665b7311 is empty
fatal: loose object 208529317841b2f25e1213f7e4f6ed3a665b7311 (stored in .git/objects/20/8529317841b2f25e1213f7e4f6ed3a665b7311) is corrupt

So the problem likely wasn’t corruption of his local repo. It took me some time to make the mental leap here, though, that the issue was likely in the only other repo involved in the transaction. When that occurred to me, it also occurred to me how we might verify that was the case.

$ cat .git/objects/20/8529317841b2f25e1213f7e4f6ed3a665b7311
f6ed3a665b7311
x+)JMU06`01??d???~???
??f???ih??l
?

Gobbledygook, but the file isn’t empty.

$ git show abcd
tree 208529317841b2f25e1213f7e4f6ed3a665b7311

src/

Its a tree object. Tree objects are immutable binary files in the git repo representing the source structure at some point in the past. Trees point to other trees (directories) or blobs (files). I am not familiar with git’s algorithms, but I am familiar with its data structures, and at this point I think I can mentally fill in the details of what has gone wrong. On the server, the .git/objects files were created, but for whatever reason never had their full data written to them. I imagine some process during pushing walks the pushed commit’s tree object, and when that walk reaches the corrupt tree object it barfs.

The interesting thing to me was that because he created the exact files as the original commit, the tree and blob objects that git stores had the same sha1 hashes, and that prevented him from pushing his branch even though the commit hash differed (since it was made at a different time) and despite it being pushed from a freshly cloned repo.

Wrapping Up
By trivially modifying the files in the commit, mostly adding comments or whitespace, the hashes of the blobs at the leaves were changed. Trees are hashed based on their contents, so those changed as well, everything was rehashed up to the commit’s root tree object. The push walked a path of objects that weren’t corrupt and things completed successfully.

I’m not sure how our IT department can fix the corruption on the server. I tried corrupting a local repo by overwriting a tree object file, and it prevented me from running `git status`, `git gc` or `git fsck`, failing with the same error. Its possible that they could delete the exact .git/objects files mentioned in the error messages we encountered. Possibly there’s a flag so that a push can overwrite the object files on a remote, which would also fix the issue.

I’m not sure I would have figured out what was going wrong or even thought to inspect git’s lower-level structures if I hadn’t read the excellent “Git from the bottom up”. I would highly recommend it, and will try and post back here when our server gets sorted.

Mountains of Obfuscation

Posted in challenge by benjaminplee on 02.26.12

What would you do?

You have a couple hundred files across several nested folders that contains within them 10000+ obfuscated random and unique user names.  You also have a translation file of all of the obfuscated names and new ones you ned to have them replaced with.  Some files have all 10000+ obfuscated names in them.  Some have none or just a couple.

How would you update all of the files?

Obviously this isn’t a crazy hard problem; there are many ways to solve it.  I know how I solved it.  I am curious as how YOU would solve it.

Go!

 

[edit: added clarification]

Competing Parsers

Posted in clojure, haskell by youngnh on 09.28.11

So, I’m writing a parser for the Turtle serialization format for RDF. In addition to being a format we use all the time at Revelytix, its a decently compact grammar, giving me a good chance to implement it using The Parsatron and suss out some of the library’s rough edges.

I hit my first rough edge with the longString production:

longString ::= """" lcharacter* """

but, having already implemented the lcharacter parser already, I didn’t see the subtleties in this production and plowed ahead with this straightforward definition:

(defparser long-string []
  (between (times 3 (char \")) (times 3 (char \"))
           (many (lcharacter))))

Which looks great and compiles without complaint, but when you feed it input, it immediately complains:

> (run (long-string) "\"\"\"roughshod\"\"\"")

Unexpected end of input at line: 1 column: 16
[Thrown class java.lang.RuntimeException]

The message here could be better, and I’ll work on that. I would want it to say Unexpected end of input, expected '"""', because what happened was the (many (lcharacter)) parser consumed too much.

Turns out, lcharacter is defined in the grammar to include double quotes, so (many (lcharacter)) ate as many as it could until it literally ran out of input.

A good regex can handle this:

> (re-find #"\"\"\".*\"\"\"" "\"\"\"roughshod\"\"\"")
"\"\"\"roughshod\"\"\""

So we should be able to as well. To keep track of whether or not we’ve consumed a hat trick of quotes, my first attempt looked something like this:

(defparser long-string []
  (letfn [(middle-part [s]
            (let->> [c (lcharacter)]
              (if (= c \")
                (two-left s)
                (middle-part (concat s [c])))))
          (two-left [s]
            (let->> [c (lcharacter)]
              (if (= c \")
                (one-left s)
                (middle-part (concat s [\" c])))))
          (one-left [s]
            (let->> [c (lcharacter)]
              (if (= c \")
                (always s)
                (middle-part (concat s [\" \" c])))))]
    (>> (times 3 (char \"))
        (middle-part []))))

Which uses 3 local, mutually-recursive functions to “count” each consecutive double quote. And they all look a lot alike. I refactored to this:

(defparser long-string []
  (letfn [(middle-part [s n]
            (let->> [c (lcharacter)]
              (if (= c \")
                (case n
                      0 (middle-part s 1)
                      1 (middle-part s 2)
                      2 (always s))
                (middle-part (concat s (repeat n \") [c]) 0))))]
    (>> (times 3 (char \"))
        (middle-part [] 0))))

The above works well, but the problem arises in the first place because lcharacter and """ share the same single-character lookahead. By examining only the next character in the input, we can’t
tell if it should belong to lcharacter or """. This suggests that we can lookahead 3 characters at a time and if we receive """, then we can interpret that not as 3 lcharacters, but as a terminating triple double quote.

(defparser long-string []
  (between (times 3 (char \")) (times 3 (char \"))
           (many
            (let->> [cs (lookahead (times 3 (lcharacter)))]
              (if (= cs [\" \" \"])
                (never)
                (lcharacter))))))

I’m not sure quite which way to go, nor can I immediately see a way to make a higher-level lookahead parser that ensures that 2 parsers don’t stomp all over each other, though that would be quite ideal. If you can, chime in below in the comments.

If you’d like to follow the development of The Parsatron, it’s on github

Tagged with:

Pro Tips: Get Your Windows MSI Installer Logs

Posted in .net, notes_to_self by benjaminplee on 07.25.11

Want more information about what an .msi is doing on your system or what caused a generic and completely useless error message to pop up?  Run the following command substituting the path to your .msi from the command line and the output file will contain verbose logs.

msiexec /i filename.msi /l*v log.txt
Tagged with: , , , , ,

Pro Tips: Run Multiple Jenkins CI Servers on a Single Machine

Posted in software development, software testing by benjaminplee on 07.24.11

Jenkins (or Hudson if you work for Oracle) is a great, simple and steady continuous integration and build server.  One of its greatest features is that despite being packaged as a standard .war ready to be dropped into your JEE web container of choice … it also contains an embedded web container that makes the .war everything you need in most situations.  Simply run

java -jar jenkins.war

and up starts Jenkins on port 8080 in all of its glory.  All of Jenkins’ configuration files, plugins, and working directories go under <USER_HOME>/.jenkins by default.  Perfect.

Except if you want to run multiple instances on the same machine.  We are going to get a config folder collision.  It turns out this is a piece of cake, but the docs are hard to find.  Jenkins will use a JENKINS_HOME path environement variable for its configuration files if one is set so a simple change means we can run Jenkins out of any directory we desire:

java -DJENKINS_HOME=/path/to/configs -jar jenkins.war

What about the port you say?  Winston, the embedded web container used, has a simple property for this too:

java -DJENKINS_HOME=/path/to/configs -jar jenkins.war --httpPort=9090

Pro Tips: Color Your Tail With Perl

Posted in java, linux by benjaminplee on 03.22.11

We ran into a situation today where our JEE server was logging everything twice.  Sadly, this meant that errors and warnings were logged as INFO and were missed.

2011-03-22 11:40:40,884 INFO  [STDOUT] (main) 11:40:40,884 WARN [ProcessRoles] Could NOT find role XXXX

Since we watched the logs most often using the tail command, the quick and dirty solution was to have tail output lines containing WARN/ERROR/SEVERE in different colors that would stand out as the scrolled past.  The bash/perl script below is the fruit of my hacking.  Suggestions for improvements welcome.

t() {
tail -100f $1 | perl -pe 's/^.*SEVERE.*$/\e[1;37;45m$&\e[0m/g' | perl -pe 's/^.*ERROR.*$/\e[1;37;41m$&\e[0m/g' | perl -pe 's/^.*WARN.*$/\e[1;33;40m$&\e[0m/g'
}
Tagged with: , , , ,

guarding an expression

Posted in clojure by youngnh on 02.01.11

Oftentimes, I want to make some computation, apply the result to a predicate, and if it passes, return that result. If the predicate does not succeed, I usually want to return some other value. I’ve run into this situation before, with no satisfying resolution. My code usually ends up looking like so:

(let [value (some-computation x y z)]
  (if (check? value)
    value
    some-other-default-value))

Which is overly verbose, even for Clojure, if you ask me. There’s a let and value appears 3 times for a fairly straightforward idiom. I think what I’d like to write instead is:

(guard check?
  (some-computation x y z)
  some-other-default-value)

The following macro does the trick of expanding to the verbose form I’ve been writing:

(defmacro guard [pred then else]
  `(let [x# ~then]
     (if (~pred x#)
       x#
       ~else)))

Now, the awkward thing about the above is that unlike an if-statement, the order in which you read things is not the order in which they get executed in. Reading it, the execution happens on line 2 first, line 1 second, and then possibly on line 3. My question to the blogosphere is, does Clojure already have something like this lurking in a lib somewhere? Or is there a blindingly obvious solution to this that I’m overlooking?

2011.tech_resolutions.pop(5.questions)

Posted in questions by youngnh on 01.06.11
  1. What are you currently hacking on?

    Just before Christmas, I put together a small LL(1) parser-generator in Clojure. LL(1) has the distinction of being one of the simplest classes of grammars to parse, and a great number of textbooks on the subject will tell you that it’s often quicker to write a parser by hand for a LL(1) grammar than to go to the trouble of using a parser-generator. I think my code removes most of those objections. However, it currently suffers from 4 problems that I vowed I would remedy before it saw the light of day: its very slow for only marginally deep parse trees, it has no tests, no documentation, and the error messages it spews forth are currently only meaningful to myself. None of these problems are sexy, nor could the process of solving them be construed as “hacking”, so my motivation to finish the project has plummeted.

  2. What do you want to hack on more in 2011?

    I have no shortage of ideas that could be expressed via computational algorithms, but recently I’ve had a harder and harder time convincing myself that any of them are significant in any way other than the challenge in realizing them. This year I’d like to stay away from the “because it’s there” projects. I’d like to hack on things that:

    • someone would pay money for: a fart app, for example, or possibly one that does facial recognition so that if you think you’ve seen somebody famous on the street or simply can’t remember the name of the person you’re currently talking to, you can pull out your phone and shove its tiny camera in their face
    • something that people I’ve never met before would find useful: most of my keystrokes are deleted, a select few are saved to files, fewer still find themselves a part of a working project, and of those projects that people other than myself get their hands on, their purpose could generously be described as niche. I write software largely for the compulsive pleasure of seeing an ephemeral thought produce flashing pixels on a screen, but I’d like to try writing something that might be useful to others at the same time.
    • change an old way of doing things: Starbucks changed the way you order drinks, parenting at dinner is a contentious process, and the airline industry has been sticking with a sub-optimal solution to one of the hardest problems in all of computation but that doesn’t mean a few guys didn’t throw a bunch of servers at the problem and give it a go of their own (and crashed and burned). These are hacks in the best sense of the word.
    • change how non-programmers interact with computers: it pains me to watch people use software. You might already be rolling your eyes in preparation for a rant from a programmer about non-technical people’s poor computer skills, but that’s not it. What I see when my mother-in-law or grandfather use their computers is the incredibly resourceful, thoughtful and clever way in which they approach an inanimate, ineffable and often arbitrary hunk of plastic and silicon, and manage to make it do unbelievably complex things. I’ve read the manuals, I spent four years in college and all of my professional career learning how these things are put together and I can barely get my wireless printer to work. The only difference between us is that I have the ability to change the machine to match their understanding of it. Helping non-programmers is my peace corps, my doctors without borders.
  3. What do you want to stop hacking on in 2011?

    I think what I wrote above covered most of what I’d like to hack on, and being a very good dichotomist, I’d like to stop hacking on anything that isn’t a Big Idea. This is not to confuse Big Ideas with small hacks. Big projects never get finished. Short sprints of interesting and valuable concepts are strongly encouraged, not the least for morale. Anything that someone might ask “Why that?” of, and that I would respond with “well, because it just seemed neat…” should be viewed with strong suspicion in 2011.

  4. Big idea that just needs a free weekend to come to life?

    As you wrote big idea with a little “i”, and I’ve just finished pontificating on big “I” ideas, I’m going to say a Java compiler that does type inference. I haven’t done my homework and checked if something like this already exists, so if it does, please point me in its direction. Aside from the massive amount of typing (lol) it’d save, why should Scala get to have all the fun? A slightly more powerful compiler might lead to a better, if sloppier, place for the Java masses.

  5. How can we get more arguing on this blog?

    We could go find that youtube video of two kids having an argument and put it up here, thereby reclaiming the rest of the hits from Google searches for “two guys arguing”.

    If you’d like something software-related, pick the “social sciences” of programming, like methodologies, or productivity enhancements or what editor to use, as they often don’t have verifiable outcomes and thus can generate argument indefinitely.

    Personally, I’ve found that being wrong about a thing leads to more comments and discussion than solving that thing competently. On a positive note, I’d like to think that it has been our extreme competence in what we do that has kept this blog from rocketing us into the stratosphere of computer scientists around the world.

A Solution from Purgatory (for the Matrix from Hell)

Posted in challenge, math, questions by benjaminplee on 01.02.11

This post attempts to recount some of my thought process which eventually (more than a year later) lead to the Matrix from Hell puzzle I posted about a few days ago.  If you don’t want to know my answer, don’t read the rest of this post.  My final solution is at the bottom.

You have been warned.

~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

~~~~~~~~~~~~~~~~~~~

Below is a list of my general thought process and the main ideas/revelations/halucinations that I know helped me to my solution.  An annotated matrix is at the bottom with some of the ideas highlighted in different colors.

  • Should indices start at 0 or 1?
  • Values along either axis count up by 1 in both directions when the other =0
  • Values are duplicated over the f(x)=y axis –> Order of indices given to function is inmaterial since they can be swapped
    [The RED 5 can be seen duplicated over the diagonal]
  • Calculation of max/min index is O(1) so stop worry about it and “forget” the bottom half of the matrix (assume X>= Y)
  • Values generally increase as they go right/down but not always
  • f(x,x)=0, f(x,0)=x
  • Values are symmetrical perpendicular to f(x)=y IFF the bounding square in question is a power of 2
  • “Quad” patterns can be seen repeating for every power of 2.  In each “Quad” the top left quadrant is the same as the bottom right and the top right is the same as the bottom left.
    [Each “Quad” of the example matrix is outlined in BLUE with the quadrants in LIGHT BLUE. PINK shows the repetition]
  • power of 2 repeating nature of the values leads to thinking of binary trees
  • binary trees leads to thinking about a possible solution which could run in O(log2 n) time:
    • In a nut shell, for a given (X,Y), algorithm would find smallest boudning power of 2 including X and would walk back “Quad”s by each successivly smaller power of 2 “Quad” until the last one is met.  For each “Quad” determine which quadrant the value lives in and remap to root values.
    • Debate if this would really work for large numbers and if it could be reduced to constant time function by math that I have forgotton from college/HS
    • Dismiss b/c I can’t remember a way and base idea is > O(1)
  • Binary trace back algorithm idea makes me realize that it can be also looked at as the “binary” representation of a number (e.g. 01011011010) where each 1 represents a “Quad” where the value was in the upper right or bottom left and vica-verca.
    [Following the matrix value at (5,6) can be seen ORANGE.  The value 3 is in the bottom right  of the 1st “Quad”, then BL, & TR. {BR BL TR} => {101} => {3}]
  • Remembered that f(x,y)=0 IFF x==y …. which means that difference between X and Y is important to value of f()
  • Eureka!
    • F(X, Y) = X xor Y
    • solution typed above in white on white to protect the innocent. highlight to view.

sample matrix with some of my thought process drawn out

That is it.  I never said it was pretty or made any sense.

Tagged with: , , , ,

2011.tech_resolutions.push(5.questions)

Posted in questions by benjaminplee on 01.01.11

In the spirit of the first posts to this blog and the new year I poste the following 5 questions to Mr Young…

  1. What are you currently hacking on?

  2. What do you want to hack on more in 2011?

  3. What do you want to stop hacking on in 2011?

  4. Big idea that just needs a free weekend to come to life?

  5. How can we get more arguing on this blog?

Tagged with: ,