Two Guys Arguing

error: object file is empty

Posted in git by youngnh on 10.14.13

A coworker could not push a branch of his development to our company Stash server this afternoon. He tried once in the morning, but the server went down, so he stayed late until the server was online again and then tried again to push his commits, only to be hit with a very cryptic error.

I stayed late debugging it with him, and here’s what we found.

When he tried to push his branch, he got an error saying that an object file is empty and that the repo is corrupt. We figured that was that, and spent a lot of time exporting his patches, and reimporting them into a freshly cloned repo. When we got his commit history back into shape, we hit `git push` again expecting smooth sailing, but got the same error message:

error: object file .git/objects/20/8529317841b2f25e1213f7e4f6ed3a665b7311 is empty
fatal: loose object 208529317841b2f25e1213f7e4f6ed3a665b7311 (stored in .git/objects/20/8529317841b2f25e1213f7e4f6ed3a665b7311) is corrupt

So the problem likely wasn’t corruption of his local repo. It took me some time to make the mental leap here, though, that the issue was likely in the only other repo involved in the transaction. When that occurred to me, it also occurred to me how we might verify that was the case.

$ cat .git/objects/20/8529317841b2f25e1213f7e4f6ed3a665b7311
f6ed3a665b7311
x+)JMU06`01??d???~???
??f???ih??l
?

Gobbledygook, but the file isn’t empty.

$ git show abcd
tree 208529317841b2f25e1213f7e4f6ed3a665b7311

src/

Its a tree object. Tree objects are immutable binary files in the git repo representing the source structure at some point in the past. Trees point to other trees (directories) or blobs (files). I am not familiar with git’s algorithms, but I am familiar with its data structures, and at this point I think I can mentally fill in the details of what has gone wrong. On the server, the .git/objects files were created, but for whatever reason never had their full data written to them. I imagine some process during pushing walks the pushed commit’s tree object, and when that walk reaches the corrupt tree object it barfs.

The interesting thing to me was that because he created the exact files as the original commit, the tree and blob objects that git stores had the same sha1 hashes, and that prevented him from pushing his branch even though the commit hash differed (since it was made at a different time) and despite it being pushed from a freshly cloned repo.

Wrapping Up
By trivially modifying the files in the commit, mostly adding comments or whitespace, the hashes of the blobs at the leaves were changed. Trees are hashed based on their contents, so those changed as well, everything was rehashed up to the commit’s root tree object. The push walked a path of objects that weren’t corrupt and things completed successfully.

I’m not sure how our IT department can fix the corruption on the server. I tried corrupting a local repo by overwriting a tree object file, and it prevented me from running `git status`, `git gc` or `git fsck`, failing with the same error. Its possible that they could delete the exact .git/objects files mentioned in the error messages we encountered. Possibly there’s a flag so that a push can overwrite the object files on a remote, which would also fix the issue.

I’m not sure I would have figured out what was going wrong or even thought to inspect git’s lower-level structures if I hadn’t read the excellent “Git from the bottom up”. I would highly recommend it, and will try and post back here when our server gets sorted.

QUnit-CLI: Running QUnit with Rhino

Posted in git, javascript, software testing by benjaminplee on 11.26.10

Rhino - wikimediaPreviously I talked about wanting to run QUnit outside the browser and about some issues I ran into.  Finally, I have QUnit running from the command line: QUnit-CLI

After a good deal of hacking and a push from jzaefferer, I gotten the example code in QUnit-CLI to run using Rhino and no browser in sight.  This isn’t a complete substitute for in-browser testing, but makes integration with build servers and faster feedback possible.

/projects/qunit-cli/js suite.js
PASS - Adding numbers
FAIL - Subtracting numbers
PASS|OK|subtraction function exists|
FAIL|EQ|Intended bug!!!|Expected: 0, Actual: -2
PASS - module without setup/teardown (default)
PASS - expect in testPASS - expect in test
PASS - module with setup
PASS - module with setup/teardown
PASS - module without setup/teardown
PASS - scope check
PASS - scope check
PASS - modify testEnvironment
PASS - testEnvironment reset for next test
PASS - scope check
PASS - modify testEnvironment
PASS - testEnvironment reset for next test
PASS - makeurl working
PASS - makeurl working with settings from testEnvironment
PASS - each test can extend the module testEnvironment
PASS - jsDump output
PASS - raises
PASS - mod2
PASS - reset runs assertions
PASS - reset runs assertions2
----------------------------------------
PASS: 22  FAIL: 1  TOTAL: 23
Finished in 0.161 seconds.
----------------------------------------

The first hurdle was adding guards around all QUnit.js’s references to setTimeout, setInterval, and other browser/document specific objects.  In addition I extended the test.js browser checks to include all of the asynchronous tests and fixture tests.  Finally I cleaned up a bit of the jsDump code to work better with varying object call chains.  My alterations can be found on my fork here.

The second hurdle was getting QUnit-CLI using my modified version of QUnit.js and adjusting how Rhino errors are handled.  Adding a QUnit submodule to the QUnit-CLI git repository easily fixed the first (I previously posted my notes on git submodules and fixed branches).  QUnit.js’s borrowed jsDump code is used to “pretty-print” objects in test messages.  jzaefferer ran into an issue when running QUnit’s own tests through QUnit-CLI resulting in the cryptic error:

js: "../qunit/qunit.js", line 1021: Java class "[B" has no public instance field or method named "setInterval".
at ../qunit/qunit.js:1021
at ../qunit/qunit.js:1002
at ../qunit/qunit.js:1085
at ../qunit/qunit.js:1085
at ../qunit/qunit.js:1085
at ../qunit/qunit.js:110
at ../qunit/qunit.js:712 (process)
at ../qunit/qunit.js:304
at suite.js:84

It turns out that errors objects (e.g. ReferenceError) throw in Rhino include an additional property of rhinoException which points to the underlying Java exception that was actually thrown.  The error we saw is generated when the jsDump code walks the error object tree down to a byte array off of the exception.  Property requests executed against this byte array throw the Java error above, even if they are done part of a typeof check, e.g.

var property_exists = (typeof obj.property !== 'undefined');

Once I figured this out, I wrapped the object parser inside QUnit.jsDump to properly pretty-print error objects and delegate to the original code for any other type of object.

...
var current_object_parser = QUnit.jsDump.parsers.object;
QUnit.jsDump.setParser('object', function(obj) {
  if(typeof obj.rhinoException !== 'undefined') {
    return obj.name + " { message: '" + obj.message + "', fileName: '" + obj.fileName + "', lineNumber: " + obj.lineNumber + " }";
  }
  else {
    return current_object_parser(obj);
  }
});
...

With these changes we have a decent command line executable test suite runner for QUnit.  With a bit more work QUnit-CLI will hopefully be able to print Ant/JUnit style XML output and/or include stack traces when errors bubble out of test code.

Tagged with: , , , ,

Tie Git Submodules to a Particular Commit or Branch

Posted in git, software development by benjaminplee on 11.14.10

While working on getting QUnit-CLI cleaned up and refactored a bit, I realized I needed to tie the example code in the Git repository to a particular version of QUnit.js (those guys are making changes too fast for me to keep up).  I have used SVN:externals prevsiously so Git submodules seemed like an obvious solution.  A single submodule should allow me to keep QUnit-CLI inherently pointing to a particular revision of QUnit.js without requiring me to seperately document which version I was testing against.

The man page for git-submodule as well as the Git Book chapter on Submodules do a good job of documenting the command with some simple examples, but none that were 100% clear for my needs.  In addition, I need my Git submodule to point to a specific commit (or branch) so that everyone cloning my code consistently can run my examples w/o fear that a new commit on HEAD will break something.

Step 1 : Add the submodule

Once the module is checked out, I need to add the QUnit submodule.  First grab the GitHub url for my QUnit fork (eventually this will be replaced with the main QUnit repo) and execute the “add” command from within your local repository root.

git submodule add git://github.com/asynchrony/qunit.git qunit

Afterward there will be two modified and staged objects in your repo: .gitmodules will contain the submodule’s local path and source URL and a new folder named qunit which contains a full clone of your source repository.

** Fraser Speirs has a good writeup on what is going on behind the scenes with the Git internals and how the key to all of this is in the index files of each repo and the modes the changes are committed with. **

Step 2 : Fix the submodule to a particular commit

By default the new submodule will be tracking HEAD of the master branch but will NOT be updated as you update your primary repo.  In order to change the submodule to track a particular commit or different branch change directory to the submodule folder and switch branches just like you would in a normal repo.

git checkout -b dev_branch origin/dev_branch

Now the submodule is fixed on the development branch instead of HEAD of master.  Just easily I could set it to specific commit or tag.

Step 3 : Commit everything

Now from your primary repository you still have two modified objects: .gitmodules file and qunit folder.  Commiting these changes will persist the new submodule tracking your desired branch.

Step 4 : Clone Recursive

The next time you (or someone else) clones this repo, they will need to do one of two things.

A) Add the –recursive flag to their git clone command

git clone REPO_URL --recursive

B) manually initialize and the submodules after the clone

git clone REPO_URL
git submodule update --init --recursive
Tagged with: , ,

hold my code for a second…

Posted in git by youngnh on 01.25.10

add, stash, edit, apply and reset

This is mostly a note on how I work with git. A lot of people used to subversion wonder why I’m so enamored of Linus Torvald’s latest project and I usually only explain enough of it to convince them that git is an elaborate and complicated tool for replicating a subversion workflow. This is a good example of a situation in which svn would fail you.

I try as much as possible to create a new branch every time I decide to work on a new unit of work. Oftentimes, I will just absent-mindedly start coding and be 4 or 5 commits deep before realizing that I forgot to branch (which is a situation git has tools to fix as well), but for this particular stretch of coding I was on a clean branch without any untracked files.

Shortly after starting, I found myself with 4 classes, none of which compiled. In a minimalist vein, at home I work with only emacs and ant when hacking on Java code. ant can cause a bit of scroll blindness when your code has compile errors, and the situation is only exacerbated when you don’t quite get Java generics.

One class was named WinLogicFactory, and it’s functionality was the end goal of this branch. The other 3 classes were support classes. They had resuable, general-purpose functionality that WinLogicFactory would use, but contained nothing specifically related to the application I was writing. I stole the idea of them from Haskell and the classes were named Either, Left and Right.

$ git status
# On branch winlogic_factory
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#	src/com/twoguys/rps/WinLogicFactory.java
#	src/com/twoguys/util/Either.java
#	src/com/twoguys/util/Left.java
#	src/com/twoguys/util/Right.java

Before Either, Left and Right were fixed and compiling, I couldn’t fix WinLogicFactory, which would use them. A bit counter-intuitively I added WinLogicFactory to the index. This allowed me to stash my current index. That left only the generic classes in my working tree:

$ git add src/com/twoguys/rps/WinLogic.java
$ git stash
$ git status
# On branch winlogic_factory
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#	src/com/twoguys/util/Either.java
#	src/com/twoguys/util/Left.java
#	src/com/twoguys/util/Right.java

Once I got Either, Left and Right compiling I could now start working on WinLogicFactory, so git stash apply restored the index with my broken factory in it. I had some more work to do to get it compiling and just to convince myself that I had really gotten my workspace back to it’s original state, so I ran a git reset, which effectively pulled any uncommitted changes out of staging in the index.

$ git stash apply
$ git reset
$ git status
# On branch winlogic_factory
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#	src/com/twoguys/rps/WinLogicFactory.java
#	src/com/twoguys/util/Either.java
#	src/com/twoguys/util/Left.java
#	src/com/twoguys/util/Right.java

At this point, I could fix WinLogicFactory without worrying about simultaneous errors from Either, Left and Right.

Things that are worth noting:

  • staging content in git’s index is not simply another step that you have to perform before you can commit files you’ve edited. It’s how you tell git about any file you care about.
  • git can manipulate code that isn’t committed. commits are merely a little extra information attached to otherwise generally useful structures that git was already creating for you.
Tagged with: ,
Follow

Get every new post delivered to your Inbox.