The Git index and Recovering Files

May 9, 2014

This question on StackOverflow asks about recovering files that were added to the Git index using git add but were subsequently removed from the index. I provided an answer for that direct question, since you can recover these changes, but I also wanted to dig a little deeper into what happens when you add something to the index.

When Igit add a file, it will add that file to the object database and place the object information in the staging area. For example, I can create a new file and git add it to my staging area, and examine my staging area using the git ls-files --stage command to see the details (including the object ID) of what's staged:

% echo "new file" > newfile.txt
% git add newfile.txt
% git ls-files --stage
100644 40ee2647744341be918c15f1d0c5e85de4ddc5ed 0       file.txt
100644 3748764a2c3a132adff709b1a6cd75499c11b966 0       newfile.txt

So this file is a normal git blob at this point, and lives inside the git repository, even though I haven't committed these changes yet:

% ls -Fls .git/objects/37 total 1 1 -r--r--r-- 1 ethomson Administ 26 May 9 09:26 48764a2c3a132adff709b1a6cd75499c11b966

That Git has already created a blob and added that information to the index is why you can make changes, stage them and then continue making changes to a file and only the staged files will be committed (not the subsequent, unstaged modifications).

If I append some data to this file, it will have both staged changes and unstaged changes:

% echo an addendum >> newfile.txt
% git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        new file:   newfile.txt

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   newfile.txt

If I were to commit right now, it would only commit the staged changes, leaving the unstaged changes behind. But what if I forgo these changes entirely and do a git reset --hard? In that case, only my index is updated but the blob will be maintained in the object database, so I can recover if I had mistakenly unstaged this:

% git reset --hard HEAD
% ls -Fls .git/objects/37
total 1
   1 -r--r--r--    1 ethomson Administ       26 May  9 09:26 48764a2c3a132adff709b1a6cd75499c11b966

Generally, though, I won't know the object ID of the file I've just misplaced, so I would use the git fsck tool, which will do an integrity check of the git repository and show me any objects that are not "reachable", either because they were part of a commit that is not on a branch anymore, or because I git added a file and did not commit it. My newfile.txt is one of these unreachable objects:

% git fsck
Checking object directories: 100% (256/256), done.
dangling blob 3748764a2c3a132adff709b1a6cd75499c11b966

Unfortunately, its filename is not stored in the object database (since identical contents would have the same object regardless of name), so if you have many dangling blobs, you will have to examine each one:

% git show 3748764
new file

Once I determine which dangling blob it is that I want to recover, I can put it back on the filesystem by redirecting git show:

% git show 3748764 > newfile.txt

And the file is recovered!