# Binary Transparency Suppose that you are a Linux distribution, and you are in the business of shipping binaries to users, as most Linux distributions are[1]. Since users will generally install those new binaries you ship in a pretty timely way, that provides a very powerful mechanism for attacks on your users, either by you or by someone else who compromises your build system. It would be very cool if you could somehow reduce that risk. Fortunately, there's a collection of techniques, sometimes collectively called "binary transparency", that let you manage that risk on behalf of your users, and lets your users do the "verify" part of "trust-but-verify". Here's how it works! ## What Actually Is Binary Transparency? Binary transparency provides end users with an audit trail of how a specific artifact was made. For programs, usually the artifact is the actual program binary, and the audit trail in question describes which source code that program was built from. You can use the same sort of techniques for other kinds of artifacts though - for example, certificate authorities use some of the same mechanisms to provide an audit trail for certificates they sign[2]. Let's focus on the program case for now. You want users to be assured that the binaries they're getting from you were built from specific source code. How? There are several parts to it: 1. Being able to identify a specific set of source files (a revision) 2. Making the translation from a source revision to a binary reproducible 3. Proving that every binary comes from a specific source revision 4. Verifying on the user side that every binary has a correct audit log ## Identifying Revisions There are a few different ways to do this, but the gold standard is a hash, taken over *all* the inputs to the build process - the source files of your program, all the included source files (system headers and so on), all the resources, all the build scripts, and so on. Listing all of those out can turn into quite a hobby and making sure the list stays complete can be even more so, so if you can it's best to build inside an empty chroot or similar so that your build process doesn't pick up files that aren't checked in, new system headers that you didn't account for, and that sort of thing[3]. If you already use a version control system where revisions are hashes, great - the version control system will validate hashes for source files and tie them together into revisions for you. If you're very lucky you can check everything, including the build scripts *and* your dependencies[4], into one repository and be able to identify all of the build inputs with a single version control revision. If you have multiple repositories or your version control system doesn't do hashing for you, you'll probably need to do a bit more work; one relatively cheap way is to always do builds from tarballs, in an empty chroot, and use the tarball hash to identify the source revision[5]. However, the source files aren't the only components of the build. There are also the build tools... ## Reproducible Builds Unless you are particularly ambitious, you probably won't want to include compiler binaries, copies of the system headers, and so on in your version control system[6]. You'll probably have to settle for depending on binary versions of those that are installed on the system or provided some other way. However, a property you do want is that the build is *reproducible*, which means that for a given source revision and set of build tools, the output binary is always bit-for-bit identical. There are many good reasons to do this, but the most important one for binary transparency purposes is that it allows others to actually validate your claims that something was built at a specific revision, because they can check your source out at that revision, install the same build tools you used, and get the same resulting binary. Let's take a specific example: say you built your program from revision X with clang 15.0.6 and a certain set of build flags. Anyone else[7] should be able to check out revision X, build with clang 15.0.6 and that same set of build flags, and get the exact same binary, bit-for-bit. Ideally you would have your continuous integration system validate this, by doing two separate builds and comparing the results to each other. To make builds reproducible, you will also want to make them hermetic: they depend *only* on a specified list of dependencies and on nothing else. You can enforce that using chroot(1) or a sandboxing tool, but you will also need to chase down all the dependencies of your compiler toolchain and so on in the process so that you can include them in your dependencies. ## Audit Logs So, you have a cryptographic way to identify a set of source files, and a reproducible translation from source files to binaries, meaning that any specific user can check that any specific binary you gave them is genuinely built from some source revision. However, you can't have every user do a full source build - that would defeat the entire purpose of shipping the binaries in the first place. What you want is some way for users to be confident that the binary you're giving them really did come from a source revision, without them having to actually do the build. How? The first key tool is a public, append-only cryptographic log, signed by you, of every binary revision that you ship. When clients are going to fetch a new binary from you, they also fetch this log, and check that the revision you're offering them is present in that log. So far so good, but that's not enough - you (or anyone with your signing keys) can still do this: 1. Generate a malicious binary in any desired way 2. Take the existing append-only (public) cryptographic log, append the malicious binary's hash to it, and then send that modified log, *with* the malicious binary, to only a specific target user Since your builds are reproducible, if that user happens to actually check, they'll find that the binary they have doesn't correspond to any source revision, but users will in general never bother to check that unless they're very, very paranoid[8]. Since (from their perspective) the append-only log does include the binary's hash, everything seems above board. Luckily, we can do better. What we really want to assure is that there is only one, single public audit log, and that individual users can't be given a modified version of it. We can do that by using "witnesses", which are third parties[9] that attest that they've seen specific revisions in the log, and perhaps even that they've reproduced a build from source at that revision. When a client is about to use a new binary, it checks not only that the binary's hash appears in the audit log, but that there are witness attestations of that hash as well. If you are feeling extra spicy, you then include some code in your updater (the code that fetches binaries and checks that they are in the public audit log and properly witnessed) which complains loudly to both you and the witnesses if it ever sees either a binary that isn't in the log, or a log entry that hasn't been properly witnessed. That basically prevents anyone from carrying out an attack involving shipping a malicious binary to certain users - even you! ## Putting It All Together Your build process then looks like this: 1. Create a fresh environment (chroot, docker container, VM, whatever) 2. Check out a source revision and its dependencies into it 3. Compile and produce artifacts 4. Hash all those artifacts together to make a "binary revision" 5. Make an entry in your append-only log: "binary revision X came from source revision Y". Since your build is reproducible, anyone else with source revision Y can validate this claim. 6. Attach your signature to the new head of your append-only log, and submit it to the witnesses for witnessing. 7. Once enough witnesses (a quorum) have witnessed it, take their signatures and attach it to the log head as well. 8. Publish the new log head. Now, clients fetching updates do this: 1. Fetch the append-only log, check your signature on the head, and check the witness signatures on the head. 2. Figure out which version they want to install, validate that it appears in the log, and install it. Note that clients might not always be installing the head version in the log - for example, if you ship a beta version of your software, non-beta clients might be installing the most recent stable version instead. However, every version you ever publish has to appear *somewhere* in the log - just not necessarily at the head. ## What Does This Actually Give You? Compared to simply signing your binary updates, this technique doesn't actually *prevent* any new attacks - instead, it makes them impossible to execute without leaving a cryptographically-verified audit trail. For example: * Someone who compromises your build system can't ship a new version of your product to anyone without leaving a publicly-verified log of having done so * If your witnesses also validate your source revision -> binary revision assertion, someone who compromises your build system can't make a binary change without a corresponding publicly logged source change, which makes it much easier to detect compromised developer creds * You can't ship backdoored or malicious binaries to specific users - only to everyone or to no-one. This vastly increases the chance that you will get caught in the process, especially if your malicious binary change requires a malicious source change. ## What Doesn't This Protect Against? * If you (or someone else) patch your updater to remove the validation logic, you can do an end-run around this protection. To avoid that, your update mechanism itself also has to be protected by binary transparency. * If a quorum of witnesses collude with you, you could get them to "witness" an entry in the append-only log that you don't show to anyone else, and use that to compromise a specific client. However, that client would end up with cryptographic proof of their collusion with you, if they happened to notice. ## Further Reading Mozilla's writeup: https://wiki.mozilla.org/Security/Binary_Transparency Google's writeup: https://developers.google.com/android/binary_transparency/pixel RFC 9162 (Certificate Transparency v2) [1]: There are some distros that are fully "from source" and need a host distro to bootstrap a compiled environment. [2]: They do this because they were required to by web browsers, not because they want to. [3]: Chasing down changes in system headers is an excellent reason to keep your dependencies as minimal as you can. [4]: This approach, called "vendoring", is popular in large projects mostly because making sure that dev machines have the right version of every dependency is a big headache, but it does also let you include dependency revisions in your source revision very easily. [5]: However, the tarball needs to include any needed system headers, and you need to generate the tarball in a deterministic way to avoid problems, which means hardcoded mtimes, file modes, owners/groups, etc etc. [6]: Chromium does this, and in fact has a separate binary revision control system called 'cipd' for fetching compiler toolchains and SDKs by hash: https://chromium.googlesource.com/infra/luci/luci-go/+/master/cipd/ [7]: If you are working on closed-source software for some reason, "anyone else" might be "your coworkers" or "your company's internal security department" instead of "random people on the internet", but the principle is the same. [8]: If they are very, very paranoid, they probably are building from source themselves anyway. [9]: Specifically trustworthy, notable third parties, like the EFF or the IETF or someone, instead of Honest Bob's Used Cars and Binary Audit Log Witnessing.