The technology community still uses SHA-1 for many things. One of the most concerning implications of this team’s technique is that it implies attacks against Git, which uses SHA-1 for every commit. Imagine if you had a tag (a SHA-1 sum) that referred to two different sets of changes: a benign changeset on your machine and a malicious changeset on GitHub. Then you deploy that tag and the malicious code runs instead of the code you expected.
As far as I know, such an attack on Git hasn’t been demonstrated yet, but in theory, I think you could replace a SHA-1 commit as I described. I bet someone will demonstrate that someday. (Think of padding files with bogus comments until you get the checksum you want.) It would be difficult (though not impossible) to switch Git to SHA-256, but I don’t know of any efforts to do that — though Git 2.11 is starting to acknowledge that abbreviated SHA-1 checksums do collide in practice.
Will such an attack happen today or tomorrow? Probably not; it takes a huge amount of resources right now. However, computation is cheaper than ever; I bet attackers will start to use services like Travis CI for computations like this, like I’ve heard is starting to be done with Bitcoin mining in pull requests on open source projects.
The best mitigation I’m currently aware of is cryptographically signing your commits, and this may be a catalyst for that to become standard practice.
Can’t push commits to GitHub, but need to move them to another clone? Adding another remote is a good option to get you out of a bind.
Why did this come up?
We do a lot of pair programming at Continuity. One of the great things about it is that if someone is out sick, you can usually depend on the other person to know the details of what was happening. But what if both people are out sick and they forgot to push their branch to GitHub and the deadline is today?
(That happened last week!)
The solution described in this post is a little specific to how we pair program remotely. Our team is almost 50% remote developers, and we all share access to the same EC2 instance. We use wemux and vim to pair (among other tools), and the provisioning is automated by puppet.
Back to the tale of the missing pair, one idea that came up was installing another SSH key to the missing person’s ~/.ssh/authorized_keys, but that has a number of problems:
That makes their account a shared account. Because you can’t tell who was using it, if someone did something bad using the account, you can’t easily tell who did it.
If the key is added by puppet (as it should be), that adds a lot of labor just to add a temporary key.
You still can’t push any commits to GitHub without also installing the key there.
Finally, installing keys on another user’s account just feels like a practice to avoid, if no other reason.
Instead, what we decided to do was to become their user via sudo (which is logged) and then push to a local clone of the git repository. That might sound complicated if you’ve only ever used git in combination with GitHub, but it’s actually a lot simpler than you’d think.
What we’re going to do is push the absent user’s code to a shared directory, then pull it to your clone of the repository.
We’ll refer to the absent user as them and the user that needs access to the code as you.
Just to set the stage, let’s look at the normal remotes for a clone from GitHub:
That’s a lot like a private GitHub repo, but without any of the pretty web user interface or authorization management.
I hope this example helps illustrate that git is very flexible and can bend to your needs. If you’ve only ever used GitHub as a remote when using git, you’re missing out on some useful functionality! However, it’s it’s still best to have all your code in a central location, even though you could add a thousand remotes. Like any tool, know when to use it!
I’ve seen a lot written about reasons why your organization should keep a monolithic repository instead of a collection of many smaller repositories, especially for internal code (non-OSS). I’ve had similar experiences to what is described in these posts.
And before you think “we’re getting too big for that,” keep in mind that these recommendations are coming out of big players like Facebook and Google.
Git ships with an awesome, underused utility called git-bisect. I had a bug to track down today that already had a spec, so it was a perfect fit. Normally our continuous integration (CI) service would have alerted us earlier, but unfortunately the failure was masked by another problem.
1 executable to test a commit
1 known bad commit (often HEAD)
1 known good commit
Prepare the test executable
In this case, I’ve called it private/git-bisect.sh and filled it with this:
# Don't forget to `chmod +x` this file.
# You can add more steps here if necessary, e.g. installing dependencies.
Find the bad commit
I’m going to assume HEAD is a bad commit (meaning that the test executable fails).
Find a good commit
Go back a reasonable amount of time (e.g. make an educated guess, like 1 month) and find a commit that doesn’t fail the test executable.
After you have your good commit, just run a set of commands and git bisect will track down the source of the problem for you:
Eventually, it will have bisected back to the source of the problem, producing output like this:
3f23680fefb5302c780ccc68b5d3006e9f37dd92 is the first bad commit
Author: He Who Shall Not Be Named <firstname.lastname@example.org>
Date: Wed Apr 23 11:40:48 2014 -0400
just change something small, no big deal... honest!
:040000 040000 088559324ff27ec7be6967e8c50934a9837b8f55 e7f89bede815904bb79d5b01807e4e01c8378f14 M app
bisect run success
That first line identifies SHA 3f23680fefb5302c780ccc68b5d3006e9f37dd92 as the source of the problem, which was right in my case. Yay for automation!
Now that I’m all done, I can:
git bisect reset
Git cleans up, and puts me back where I started.
Normally just running git show $first_bad_commit will reveal something useful. Tracking down the problem depends on the situation, of course. (Keep in mind that the “first bad commit” might not be the one you’re looking for.)