Dealing with Data Corruption with GIT, or, I’m Famous!
I store all the data I care about in git. Not so recently I had updated a few files in one of my repositories when my computer kernel panic‘d. Right before the panic I was getting ready to push my changes into a backup repository on another machine. Upon reboot I went to do this when git noticed what turned out to be data corruption.
One minor aside: if I had been using any of some other VCSs, one that did not hash whatever I put into it, I probably wouldn’t have to deal with this problem. But then I would only notice the corruption much later, when I’ve forgotten all the changes, and I run into some hard-to-diagnose problem. Dealing with this up front is definitely better.
This prompted a thread on the git mailing list (the post contains a detailed description of the symptoms). The problem was that two git objects corresponding to two different, recent commits in my repository had been corrupted. Now, there are several ways one can proceed when dealing with corruption, in ascending order of simplicity:
- Copy the corrupted objects from a backup repository.
- Replace the offending commit with a new commit altogether, re-creating approximately the same changes.
- Re-create exactly the changes that turn the commit-before-offending-commit into the offending commit.
I couldn’t take option 1, because I didn’t have a backup; and I couldn’t take option 3 because I didn’t remember the exact changes between the previous and offending commits. So I had to go with 2. This option, incidentally, is an option that is not covered explicitly in the relevant section of the manual. Also, as I usually commit early and often, it was pretty easy for me to reproduce these changes.
Replacing a Corrupted Commit
As noted in the thread, my git repository was in the following state:
$ git fsck --full
error: 320bd6e82267b71dd2ca7043ea3f61dbbca16109: object corrupt or missing
missing blob 320bd6e82267b71dd2ca7043ea3f61dbbca16109
Jakub Narebski was kind enough to explain with diagrams how this is done. Assume that A is the SHA1 ID of the commit preceding the corrupt commit, and B is the ID of the commit immediately following. First, create a new branch, here called corruption, whose head is the commit before the corrupt commit.
$ git checkout -b corruption A $ ... edit edit edit ... $ git commit -a -m <something-like-corrupted-commit-msg>
The next step is teaching the git repository to ignore the corrupted commit. To accomplish this we use the undocumented grafts file. Conceptually, this file is extremely simple. It consists of any number of entries, one per line. Each entry is of the form:
<sha1-id> <sha1-id>*
that is, a SHA1 ID followed by zero or more IDs. The effect of this is that in your local repository, git will treat the first named object as having the parents given. In this way we can trick git, by adding the entry:
$ echo B '<new-commit>' >> .git/info/grafts
At this point, switch back to master.
$ git checkout master $ git fsck --full
There should be no errors. (There might be warnings.)
Unfortunately, this is not the whole story. The grafts file is purely a local measure. Every clone of this repository will still have the corruption. So we have to teach git to write the grafts information directly into the history. Enter git filter-branch.
Replacing a Corrupted Commit For All Time
git filter-branch rewrites history while allowing filters to alter the history. We’ll use it to carve the grafts file into an actual git history.
(on master) $ git filter-branch HEAD Rewrite (3/3) Ref 'refs/heads/master' was rewritten
Now the clones should use the new history. VoilĂ !
You shouldn’t lie about being famous
Behold, recorded for all time in the git source tree:
commit e9039dd35194b7c1cf4ecd479928638166b8458f
Author: Linus Torvalds
Date: Tue Jun 10 18:47:18 2008 -0700
Consolidate SHA1 object file close
This consolidates the common operations for closing the new
temporary file that we have written, before we move it into
place with the final name.
There's some common code there (make it read-only and check
for errors on close), but more importantly, this also gives a
single place to add an fsync_or_die() call if we want to add
a safe mode.
This was triggered due to Denis Bueno apparently twice being
able to corrupt his git repository on OS X due to an unlucky
combination of kernel crashes and a not-very-robust
filesystem.
Signed-off-by: Linus Torvalds
Signed-off-by: Junio C Hamano
Also, I think I’m the reason for a recent patch adding a new git configuration option, core.fsyncobjectfiles, described as:
This is a total waste of time and effort on a filesystem that orders data writes properly, but can be useful for filesystems that do not use journalling (traditional UNIX filesystems) or that only journal metadata and not file contents (OS X’s HFS+, or Linux ext3 with “data=writeback”).
Configuring Emacs for Gmail’s SMTP over SSL
When I finally convinced emacs to send mail with Gmail it was only after I had visited 6 or 7 websites on the topic, and inferred the solution. I’m posting here in hopes that at least one resource has all the information in one place. The task is to get Emacs to send mail using your Gmail account, over SSL.
Install starttls (gnutls)
According to Fink, starttls is a “simple wrapper program for STARTTLS protocol”, which is what implements SMTP-over-SSL. This step was the neglected one of all the blog entries I read on this topic. Even the emacs error messages given when attempting to send email without starttls available don’t indicate that it’s necessary or even missing. I had to poke around smtpmail.el to find out that such a program even existed.
Update 16 Jan 2008: As mentioned in the comments, installing gnutls also works. Update 09 Dec 2008: You may also need to customise the variable starttls-use-gnutls (loaded by the starttls package); for example:
(setq starttls-use-gnutls t)
Configure emacs
I eventually settled on the following settings for my .emacs:
(setq send-mail-function 'smtpmail-send-it
message-send-mail-function 'smtpmail-send-it
smtpmail-starttls-credentials
'(("smtp.gmail.com" 587 nil nil))
smtpmail-auth-credentials
(expand-file-name "~/.authinfo")
smtpmail-default-smtp-server "smtp.gmail.com"
smtpmail-smtp-server "smtp.gmail.com"
smtpmail-smtp-service 587
smtpmail-debug-info t)
(require 'smtpmail)
The smtpmail package is included in emacs 22. I’m not sure if it is included with any earlier version.
The file .authinfo is in netrc format. It’s for storing your authentication information. Here are some quick-start directions:
- Create an the file
.authinfoin your home directory. - Insert the following:
machine smtp.gmail.com login [your name]@gmail.com password [your actual password]
- Set its permissions so only your user can read and write it. I did this with:
chmod go-rwx .authinfo chmod u+rw .authinfo
Now, you should be able to run emacs and test it out by composing an email with C-x m, and sending with C-c C-c. Please feel free to post a comment if you have problems. And be sure to set the variable smtpmail-debug-verb to t while you’re debugging: it makes the *Messages* buffer include much more useful information.
Update 20 Jul 2007: Removed newlines in suggested .authinfo contents; I think it doesn’t parse correctly, at least under certain conditions.
Update 13 Dec 2008: Reworked the configuration so it always uses .authinfo — apparently Google doesn’t like it any other way. This came about because I tried using the non-.authinfo config on my machine, and Gmail kept rejecting mail to send. As soon as I started using .authinfo, it worked.
Changing SVN working copy URL
So feel free to ignore it. However, anyone who cares about changing a subversion working copy’s URL will think this is the best post ever. Here’s how to use find and perl to do it.
To replace the file:// prefix with svn+ssh://hostname, type the following in the root directory of the subversion working copy:
find . -name entries -type f -exec perl -pi.orig -e ’s!file://!svn+ssh://hostname!’ {} \;
(This will also create a backup for each filename in filename.orig.) Of course, feel free to substitute another URL prefix for either of those above.
-
Archives
- July 2009 (1)
- March 2009 (2)
- July 2008 (1)
- May 2008 (1)
- April 2008 (1)
- March 2008 (1)
- February 2008 (1)
- October 2007 (1)
- August 2007 (1)
- July 2007 (2)
- April 2007 (4)
- August 2006 (1)
-
Categories
-
RSS
Entries RSS
Comments RSS