Skip to main content

Benjamin Oakes

Photo of Ben Oakes

Hi, I'm Ben Oakes and this is my geek blog. Currently, I'm a Ruby/JavaScript Developer at Liaison. Previously, I was a Developer at Continuity and Hedgeye, a Research Assistant in the Early Social Cognition Lab at Yale University and a student at the University of Iowa. I also organize TechCorridor.io, ICRuby, OpenHack Iowa City, and previously organized NewHaven.rb. I have an amazing wife named Danielle Oakes.

Google announces the first practical technique for generating a SHA-1 collision

by Ben

This is big news.

We hope that our practical attack against SHA-1 will finally convince the industry that it is urgent to move to safer alternatives such as SHA-256.

Source: Announcing the first SHA1 collision – Google Online Security Blog

The technology community still uses SHA-1 for many things.  One of the most concerning implications of this team’s technique is that it implies attacks against Git, which uses SHA-1 for every commit.  Imagine if you had a tag (a SHA-1 sum) that referred to two different sets of changes: a benign changeset on your machine and a malicious changeset on GitHub.  Then you deploy that tag and the malicious code runs instead of the code you expected.

As far as I know, such an attack on Git hasn’t been demonstrated yet, but in theory, I think you could replace a SHA-1 commit as I described.  I bet someone will demonstrate that someday.  (Think of padding files with bogus comments until you get the checksum you want.)  It would be difficult (though not impossible) to switch Git to SHA-256, but I don’t know of any efforts to do that — though Git 2.11 is starting to acknowledge that abbreviated SHA-1 checksums do collide in practice.

Will such an attack happen today or tomorrow?  Probably not; it takes a huge amount of resources right now.  However, computation is cheaper than ever; I bet attackers will start to use services like Travis CI for computations like this, like I’ve heard is starting to be done with Bitcoin mining in pull requests on open source projects.

The best mitigation I’m currently aware of is cryptographically signing your commits, and this may be a catalyst for that to become standard practice.

How to Save the Day with Git Remotes

by Ben

This post originally appeared on Continuity’s engineering blog. I’ve cross-posted it here for posterity.

Can’t push commits to GitHub, but need to move them to another clone? Adding another remote is a good option to get you out of a bind.

Why did this come up?

We do a lot of pair programming at Continuity. One of the great things about it is that if someone is out sick, you can usually depend on the other person to know the details of what was happening. But what if both people are out sick and they forgot to push their branch to GitHub and the deadline is today?

(That happened last week!)

The solution described in this post is a little specific to how we pair program remotely. Our team is almost 50% remote developers, and we all share access to the same EC2 instance. We use wemux and vim to pair (among other tools), and the provisioning is automated by puppet.

Back to the tale of the missing pair, one idea that came up was installing another SSH key to the missing person’s ~/.ssh/authorized_keys, but that has a number of problems:

Finally, installing keys on another user’s account just feels like a practice to avoid, if no other reason.

Instead, what we decided to do was to become their user via sudo (which is logged) and then push to a local clone of the git repository. That might sound complicated if you’ve only ever used git in combination with GitHub, but it’s actually a lot simpler than you’d think.

The Plan

What we’re going to do is push the absent user’s code to a shared directory, then pull it to your clone of the repository.

We’ll refer to the absent user as them and the user that needs access to the code as you.

How to

Just to set the stage, let’s look at the normal remotes for a clone from GitHub:

them$ git remote -v
origin  git@github.com:your-organization/your-repo.git (fetch)
origin  git@github.com:your-organization/your-repo.git (push)

What we’re going to do is make a remote on the local filesystem.

We’ll start by cloning into a shared directory (/tmp in this case):

them$ mkdir -p /tmp/git
them$ cd /tmp/git
them$ git clone git@github.com:your-organization/your-repo.git

We know that the absent user has a branch with code you need:

them$ git branch
  master
  develop
* feature/lol-i-will-be-here-tomorrow

So we’ll push it to that repo on the local filesystem:

them$ git remote add my-awesome-local-git-repo /tmp/git/your-repo/.git
them$ git remote -v
my-awesome-local-git-repo       /tmp/git/your-repo/.git (fetch)
my-awesome-local-git-repo       /tmp/git/your-repo/.git (push)
origin  git@github.com:your-organization/your-repo.git (fetch)
origin  git@github.com:your-organization/your-repo.git (push)

Then you can go back to your user account and pull it.

But first, you’ll need to make sure the /tmp/git/your-repo files are accessible. Since it’s a temporary clone, you could do something like this:

you$ sudo chown -R you:you .

And then go back your clone, add the remote, and check out the branch:

you$ git remote add my-awesome-local-git-repo /tmp/git/your-repo/.git
you$ git fetch my-awesome-local-git-repo
you$ git co feature/lol-i-will-be-here-tomorrow

And there’s the branch you needed! We’re all done in our scenario, so you could now rm -rf /tmp/git.

Summary

Git allows you to do a lot of interesting things like have a remote on the local filesystem, as described above:

$ git remote add my-awesome-local-git-repo /tmp/git/your-repo/.git

The remote on the local filesystem acts just like any other remote; it just happens to be hosted locally. There’s very little magic going on! It’s just a pile of files that git manages.

Another useful type of remote you can make yourself is a clone on another machine that’s accessible over SSH (and has the required git executables in $PATH):

$ git remote add my-awesome-remote-git-repo ssh://your-server/path/to/your-repo.git

That’s a lot like a private GitHub repo, but without any of the pretty web user interface or authorization management.

I hope this example helps illustrate that git is very flexible and can bend to your needs. If you’ve only ever used GitHub as a remote when using git, you’re missing out on some useful functionality! However, it’s it’s still best to have all your code in a central location, even though you could add a thousand remotes. Like any tool, know when to use it!

The case for a monolithic repository

by Ben

Gregory Szorc’s Digital Home | On Monolithic Repositories.
Gregory Szorc’s Digital Home | Notes from Facebook’s Developer Infrastructure at Scale F8 Talk.

I’ve seen a lot written about reasons why your organization should keep a monolithic repository instead of a collection of many smaller repositories, especially for internal code (non-OSS). I’ve had similar experiences to what is described in these posts.

And before you think “we’re getting too big for that,” keep in mind that these recommendations are coming out of big players like Facebook and Google.

Convert bzr to git

by Ben

Convert bzr to git | AstroFloyd's blog.

I found a couple bzr repositories on my computer recently that I decided to convert to git. I found this nice writeup on how to convert.

On Ubuntu:

sudo apt-get install git bzr bzr-fastimport

Then:

cp -pr repo-dir ${repo}_backup
cd ${repo}
git init
bzr fast-export --plain . | git fast-import
git co -f master
rm -rf .bzr/

Recipe: git bisect

by Ben

This post originally appeared on Continuity’s engineering blog. I’ve cross-posted it here for posterity.

Git ships with an awesome, underused utility called git-bisect. I had a bug to track down today that already had a spec, so it was a perfect fit. Normally our continuous integration (CI) service would have alerted us earlier, but unfortunately the failure was masked by another problem.

Ingredients

Directions

Prepare the test executable

In this case, I’ve called it private/git-bisect.sh and filled it with this:

# Don't forget to `chmod +x` this file.
# You can add more steps here if necessary, e.g. installing dependencies.
rspec spec/services/my_service_spec.rb

Find the bad commit

I’m going to assume HEAD is a bad commit (meaning that the test executable fails).

Find a good commit

Go back a reasonable amount of time (e.g. make an educated guess, like 1 month) and find a commit that doesn’t fail the test executable.

Bisect!

After you have your good commit, just run a set of commands and git bisect will track down the source of the problem for you:

bad_commit=HEAD
good_commit=fbb3823
git bisect start $bad_commit $good_commit
git bisect run private/git-bisect.sh

Eventually, it will have bisected back to the source of the problem, producing output like this:

3f23680fefb5302c780ccc68b5d3006e9f37dd92 is the first bad commit
commit 3f23680fefb5302c780ccc68b5d3006e9f37dd92
Author: He Who Shall Not Be Named <voldemort@example.com>
Date:   Wed Apr 23 11:40:48 2014 -0400

    just change something small, no big deal... honest!

:040000 040000 088559324ff27ec7be6967e8c50934a9837b8f55 e7f89bede815904bb79d5b01807e4e01c8378f14 M      app
bisect run success

That first line identifies SHA 3f23680fefb5302c780ccc68b5d3006e9f37dd92 as the source of the problem, which was right in my case. Yay for automation!

Clean up

Now that I’m all done, I can:

git bisect reset

Git cleans up, and puts me back where I started.

Investigate

Normally just running git show $first_bad_commit will reveal something useful. Tracking down the problem depends on the situation, of course. (Keep in mind that the “first bad commit” might not be the one you’re looking for.)

Good hunting!

Resources

:wq

A Huge List of Free Programming Books

by Ben

A Huge List of Free Programming Books.

Lots of topics. Not all are suitable for offline- or tablet-reading (web-only). Many PDFs are marked with “(PDF)”.

xkcd: ah, the famous “changed code” commit message

by Ben


git_commit

(via XKCD)

So true! But pair programming can help with that. :)

In real life though… I sometimes use the commit message “(m)” when I only changed whitespace or a comment, etc. it’s a lot easier than writing a description that’s longer than the change itself.