I’m skeptical about build reproducibility, but ardent supporters are defending and cheering for it at every opportunity. After a few too many heated discussions, I’ve decided to write down my thoughts on the topic.
I’ll try my best to summarize the arguments for reproducible builds, and explain why I find them unconvincing.
Supporters like to pretend the topic is simple, as one reproducibility fan brusquely put it to me on twitter:
“Reproducibility is important. Source code A leading to binary B through a reproducible build guarantees what you see (source) is what you get (the binary from the vendor). What is not clear here?”
What isn’t clear is what benefit the reproducibility provides. The only way to verify that the untrusted binary is bit-for-bit identical to the binary that would be produced by building the source code, is to produce your own trusted binary first and then compare it. At that point you already have a trusted binary you can use, so what value did reproducible builds provide?
This diagram demonstrates how to get a trusted binary without reproducible builds.
The answer to this question is that reproducible builds are not for users, they are expected to nominate somebody they trust to build it for them, and verify that the output is correct.
The revised workflow is similar to the first diagram, but now a trusted vendor takes care of the compilation step, so the problem is the same. The trusted vendor has to produce the binary anyway, why not just use that one, making reproducible builds unnecessary?
The answer is that we can design a system where several third parties reproduce the binary, and we can require them all to agree that a binary matches. Here is a diagram of that workflow.
The problem with this scenario is that the user still has to trust the vendor to do the verification. If the trusted vendor is compromised, then they can provide tampered binaries. If they’re not compromised, then there was no benefit to reproducing it with third parties.
In effect, this is no different to how the system works today with Linux distributions.
The answer to this problem is that we can build a system where the user only has to trust the vendor once. If the vendor is compromised after that point, the reproducing builds will prevent them from distributing tampered packages to the user.
This is a little more complicated, the user can’t verify the builds reproduce by compiling them themselves, because then they already have a trusted build. The answer is for the user to nominate the vendors they trust, and then require a signature from them to install any packages.
Here is that workflow:
Now if the vendor is compromised or becomes malicious, they can’t give the user any compromised binaries without also providing the source code. This ignores some complexities, like ensuring security updates are delivered even if one vendor is compromised, what to do if the reproducers stop working, or how to reach consensus if the reproducers and your vendor disagree on what software or fork you should be using.
Regardless, even if we ignore these practicalities, the problem with this solution is that the vendor that was only trusted once still provides the source code for the system you’re using. They can still provide malicious source code to the builders for them to build and sign.
I don’t know what supporters suggest is the solution to this problem, perhaps that the vendor you trusted shouldn’t provide any patches, configuration or any of the system software. If operating system vendors can’t actually modify or configure the operating system, then frankly this doesn’t seem like a useful system.
Perhaps some people are convinced this system is still worthwhile and achievable, but it is clearly not a simple solution. For this reason, I think it is entirely reasonable to be skeptical about the benefits of reproducible builds, and the benefits are not as clear as supporters claim.
- Q. It’s easier to audit source code than binaries, and this will make it harder for vendors to hide malicious code.
I don’t think this is true, because of “bugdoors”. A bugdoor is simply an intentional security vulnerability that the vendor can "exploit" when they want backdoor access.
The benefit of bugdoors to attackers is that they’re perfectly plausibly deniable. If someone catches you, you can simply claim it was a mistake, and there are zero consequences. You can then repeat this ad infinitum, it’s simply not unusual to fix “mistakes” continuously, and there is no way to determine intent.
If someone wants to provide a malicious program, reproducible builds can force them to also provide the source code, but it can’t force the program to be non-malicious, so this is not particularly useful. You already have to trust the source code.
You might claim that I have no data to support this, but that’s the benefit of bugdoors to attackers: There can never be data to prove your wrongdoing.
With bugdoors, you don’t need to deny it - you just claim it was an error, and you’re automatically forgiven.
I think this is true, but ignores significant trade-offs. The vendor needs to create and maintain two disparate build infrastructures, and then provide additional people privileged access to that new infrastructure. If you don't do this, there was no benefit to reproducible builds because you'd be building the same potentially compromised binary twice.
If someone wants to provide a malicious program, reproducible builds can force them to also provide the source code, but it can’t force the program to be non-malicious, so this is not particularly useful. You already have to trust the source code.
You might claim that I have no data to support this, but that’s the benefit of bugdoors to attackers: There can never be data to prove your wrongdoing.
- Q. It’s easier to tamper with binaries than to write a bugdoor, so reproducible builds do improve security.
With bugdoors, you don’t need to deny it - you just claim it was an error, and you’re automatically forgiven.
- Q. Build servers get compromised, and that’s a fact. Reproducible builds mean proprietary vendors can quickly check if their infrastructure is producing tampered binaries.
I think this is true, but ignores significant trade-offs. The vendor needs to create and maintain two disparate build infrastructures, and then provide additional people privileged access to that new infrastructure. If you don't do this, there was no benefit to reproducible builds because you'd be building the same potentially compromised binary twice.
We know that attackers really do want to compromise build infrastructure, but more often they want to steal proprietary source code, which must pass through build servers.
This means that vendors will increase the likelihood of attacks that really are happening, to prevent an attack that could happen.
That is a significant trade off, and the decision to invest in reproducible builds isn’t as obvious as supporters claim.
I think this is a fantasy threat model. If the user does discover the vendor was malicious, what are they supposed to do?
The malicious vendor can simply refuse to provide them with signed security updates instead, so this threat model doesn’t work.
I think this argument is ridiculous, and would mean GPL binaries also can’t use code signing or TLS. Clearly the vendor cannot give you the private keys required to produce the code signatures or the CA roots, so by this argument they also violate the GPL.
I think this is true, but there are other attacks against compromised build servers, all of which are more common than producing tampered builds.
More often, attackers want signing keys so they can sign their own binaries, steal proprietary source code, inject malicious code into source code tarballs, or malicious patches into source repositories.
Reproducible builds don’t help with any of those problems.
Q. A reproducible build is a good quality build. Whether there are security benefits or not, I just want people to do it.
Whether reproducible builds are better quality or not is a matter of opinion, and we shouldn’t be trying to force our opinions on others by claiming it’s for security.
I happen to disagree, and don’t think reproducibility makes a quality build, I think it adds unnecessary complexity.
This means that vendors will increase the likelihood of attacks that really are happening, to prevent an attack that could happen.
That is a significant trade off, and the decision to invest in reproducible builds isn’t as obvious as supporters claim.
- Q. If a user has chosen to trust a platform where all binaries must be codesigned by the vendor, but doesn’t trust the vendor, then reproducible builds allow them to verify the vendor isn’t malicious.
I think this is a fantasy threat model. If the user does discover the vendor was malicious, what are they supposed to do?
The malicious vendor can simply refuse to provide them with signed security updates instead, so this threat model doesn’t work.
- Q. Non-reproducible builds violate the GPL, because you can’t produce a bit-for-bit identical binary from the provided source code.
I think this argument is ridiculous, and would mean GPL binaries also can’t use code signing or TLS. Clearly the vendor cannot give you the private keys required to produce the code signatures or the CA roots, so by this argument they also violate the GPL.
- Q. Whether it’s useful for end users or not, it will allow experts to monitor for compromised build servers producing tampered builds.
I think this is true, but there are other attacks against compromised build servers, all of which are more common than producing tampered builds.
More often, attackers want signing keys so they can sign their own binaries, steal proprietary source code, inject malicious code into source code tarballs, or malicious patches into source repositories.
Reproducible builds don’t help with any of those problems.
Q. A reproducible build is a good quality build. Whether there are security benefits or not, I just want people to do it.
Whether reproducible builds are better quality or not is a matter of opinion, and we shouldn’t be trying to force our opinions on others by claiming it’s for security.
I happen to disagree, and don’t think reproducibility makes a quality build, I think it adds unnecessary complexity.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.