Cypherpunks, Mixmaster, and Mixminion: Anonymous Email For as long as email has existed, people have wanted to send email anonymously. This post describes the evolution of email anonymity and anonymous message networks in general. It touches on some of the same techniques often used for anonymous packet networks (like Tor) but this post is not about Tor. So, let's suppose that you are you@mail.host and you would like to send mail to a mailing list - let's call it controversial@mail.host. Obviously, you can simply send an email to that list, but SMTP includes the From: address, which will tell everyone who sent it, and you may get in trouble, get ostracized, and generally experience the consequences of your actions. In fact, since SMTP is a cleartext protocol[1], your email will likely get monitored in transit and you may experience such consequences even if nobody actually on the list cares at all about what you wrote. That's pretty rough. What can we do about it? One idea is to simply supply a fake "From:" address in your email, but SMTP is a pretty information-rich protocol and includes a bunch of other fields like, e.g., the IP address you sent mail from which are a lot harder to forge - assuming you can find a mail server willing to accept arbitrary From: addresses anyway. One obvious way around this is what's called a "remailer", which takes email from anywhere and re-sends the mail as though it had come from the remailer itself. For example, your original mail might be: From: you@mail.host To: re@mailer.site and in the body itself you could indicate something like: Destination: controversial@mail.host The remailer could then strip the destination off and mail the rest of the body as though it had come from the remailer itself, i.e.: From: re@mailer.site To: controversial@mail.host and the remailer could handle replies to your mail (by forwarding them back to you) and so on. This is all well and good, especially if you use GPG or something similar for the mail body, since then the remailer doesn't get to see the text of the mail you're sending. In fact, if the remailer has its own GPG key, you could actually encrypt the mail twice, like this: encrypted_message = encrypt(your_key, recipient_key, real_message) wrapped_message = encrypt(your_key, remailer_key, "Destination: ..." + encrypted_message) or, using layer of '*' to show encryption layers, we might write: From: you@mail.host To: re@mailer.site * Destination: controversial@mail.host ** I have a terrible opinion: ...[2] so that someone monitoring the remailer's incoming mail, or your outgoing mail, won't know where your message was supposed to go[3]. Upon receiving your mail, the remailer decrypts it with its own key, stripping off one "layer" of encryption and getting: From: you@mail.host To: re@mailer.site Destination: controversial@mail.host * I have a terrible opinion: ... Note that the second layer of encryption cannot be decrypted by the remailer, since you encrypted it with the desired eventual recipient's key, not the remailer's. This is all well and good. However, there is a problem: you are still trusting the remailer totally. The remailer knows both who you are and who you are mailing - it has to, because it is in direct contact with both you and the eventual recipient. You can avoid *that* by chaining remailers together, and in fact this scheme allows exactly that, by adding more layers of encryption. For example, if we have two separate remailers re@mailer1.site and re@mailer2.site, we could do this: From: you@mail.host To: re@mailer1.site * Destination: re@mailer2.site ** Destination: controversial@mail.host *** I Have a terrible opinion: ... Here, re@mailer1.site strips off the first "layer" and forwards the result to re@mailer2.site, which strips off the second "layer", revealing the ultimate destination, and ultimately sends: From: re@mailer2.site To: controversial@mail.host * I have a terrible opinion: ... This technique is called "onion encryption" because of the layered structure, and it is the same approach that Tor and many other protocols use for producing hop-by-hop anonymity. In this system, re@mailer1.site knows who you are (because you send the mail directly to it) but not where the mail is going and re@mailer2.site knows where the mail is going but not who sent it; the sender and receiver cannot be "linked" by either one of the remailers[4]. This exact system, more or less, is called a "type 1" email anonymity system. While it does work, and is extremely easy to deploy and interoperate with existing mail systems, it has a couple of flaws that make it very easy to attack. Flaw #1: vulnerability to traffic analysis. Removing a layer of encryption decreases the size of a message by a pretty fixed amount (the overhead of whatever encryption is in use), so if someone is monitoring mailer1.site's connection, they can probably guess that your incoming message corresponded to a specific outgoing message, because the sizes of the two will be very well correlated. Naive implementations also have strong time correlation, meaning the next message to leave the relay has a pretty good chance of being the one you just sent. This is pretty bad on its own with just an adversary monitoring mailer1.site, but what if an adversary is monitoring both mailer1.site and mailer2.site? Then they can actually "link" the message the entire way using either timing or message sizes, which is pretty bad news for your terrible opinion. Flaw #2: replays. Anyone who wants to can send new copies of your original message to re@mailer1.site, and it will unwrap them and forward them on exactly as before. This can let adversaries do things like, e.g., wait for times that a relay is less busy and then send a duplicate of your message to make timing correlation easier to do. This *can* be prevented, but it requires care, and attackers can often just wait "long enough" to re-send your message, unless the remailers remember messages they've already forwarded for all eternity. Fortunately, it is possible to design around these two flaws. To defeat replays, we can require that messages be below a certain age and require relays to retain seen message IDs younger than that age only. Defeating traffic analysis requires a few different techniques, but the two biggest ones are: 1. Break messages into fixed-size chunks, and route these chunks around the network. For example, if your email is 1500 bytes and the network's chunk size is 1024, it gets padded to a multiple of the chunk size (2048) then split into two chunks to be sent. Since adversaries don't know what is in any given chunk, they can't tell how long the original message was - just that n chunks are being sent between a pair of relays. With a bit of cleverness, they can't know which chunks are part of which messages. As you strip layers of onion off, you add new padding to ensure that the message stays exactly the same length, and especially the same number of chunks. 2. Break up the time correlation on messages. This unavoidably adds latency to the system, but for privacy purposes this is a Good Thing. A great deal has been written about how to do this, and there are many clever approaches, but a wasteful but relatively bulletproof one is actually to give the network a fixed transmit rate[5] - i.e., re@mailer1.site sends n chunks to re@mailer2.site every minute, regardless of whether it has data to send or not; the ones that are not real data are dropped by re@mailer2.site after it decrypts them. This defeats analysis of message timing[6]. The combination of these two techniques, and some other improvements to the protocol, yields the "type 2" or "mixmaster" type remailers, named after one of the main implementations of this approach. => https://datatracker.ietf.org/doc/html/draft-sassaman-mixmaster-00 However, there is another flaw waiting in the wings, and this one is harder to fix. Flaw #3: tagging. This is a more insidious attack. You actually can't digitally sign your messages at any of the "outer" layers of the encryption, because if you did, you would have to use a key that the remailers knew to be able to validate the signature in the first place. If you did that, you'd be telling the relays who you are, which defeats the entire point. Obviously you can sign the innermost message with some appropriately pseudonymous GPG key which your recipient is aware of, but signing the outer layers is impossible. That leads to the following nefarious idea: an attacker can flip a bit (or several bits) in the *payload part* of the message as it is sent from you to re@mailer1.site - easy enough to do, since SMTP is generally plaintext. This will make the message eventually decrypt to gibberish, of course, but re@mailer1.site has no idea about that - all it does is strip off a layer of encryption and pass the message. The second remailer does the same, and now a nonsense message is sent to controversial@mail.site. If your adversary is monitoring both you and controversial@mail.site, your incoming message (which they flipped bits in) and the outgoing corrupted message are now pretty strongly linked. Eek! This problem actually turns out to be really annoying to fix. We can't sign the outer layers to make them tamper-evident, so the relays can't drop corrupted data mid-stream, and allowing the corrupted message to reach the end of the relay chain is exactly what allows the attack in the first place. One approach is to actually encrypt part of the routing data together with the payload message, using a block cipher with a very large block size - ideally, the size of the entire messages. Good block ciphers have the property that, if you flip even one bit of the input, every bit in the output changes with probability 1/2. Let's imagine we have two functions: AE(k, m): asymmetrically encrypt m to recipient public key k SE(k, m): symmetrically encrypt m with secret key k, with a wide block cipher and suppose we're going to use three relays, with public keys Kr0, Kr1, and Kr2, and we're ultimately trying to mail a destination d with key Kd. In our old, tagging-vulnerable scheme, we were sending: AE(Kr0, "r1" + AE(Kr1, "r2" + AE(Kr2, "d" + AE(Kd, m)))) Since all those asymmetric encryptions are unauthenticated (there's nothing for the relays to authenticate them with), bits can be freely flipped in the outer message's payload part and those flips won't cause a problem until the last leg. However, if we instead generate a temporary symmetric key Kt and do this: AE(Kr0, "r1" + AE(Kr1, "r2" + Kt + SE(Kt, AE(Kr2, "d" + AE(Kd, m))))) now, when the message reaches the middle relay r1, r1 does an extra step: it strips a layer of symmetric encryption off both the message *and* the rest of the routing data. Since the block cipher has this neat property that any bit flip is likely to destroy the message, any modification of the message before it reaches r1 is likely to leave the rest of the routing data invalid after r1 unwraps it, and r1 can drop it. An attacker can still tag messages going into the network, but they will be dropped halfway through and not emerge as detectable corrupt messages. This approach, more or less, is what the "mixminion" mail relay does, and this is known as a "type 3" relay. Mixminion also moves away from SMTP as a transport for packets being relayed and towards a protocol running over TLS, so it's much harder for attackers to either see or modify messages in transit throrugh the network. You can read more about Mixminion here: => https://www.mixminion.net/minion-design.pdf Mixminion has one other fascinating feature, which I mentioned above briefly: pseudonymous replies. In the earlier designs it was impossible to actually reply to a pseudonymous message and have the reply be delivered, since the mix network would have no idea where to send the reply to without you having some public email address to receive it at - and if you had that address and included it, you wouldn't be anonymous to begin with! Mixminion's approach to that is what's called a "Single-Use Reply Block": when sending a message, you can include *just the routing information* needed to send a reply to you, onion-wrapped as usual, and leave the message body empty. The recipient of your message can, through a bit of cryptographic trickery that I've papered over above, put their message into that empty "slot" and send it to you, and it will traverse the remailer network back to you exactly as if they had pseudonymously emailed you - but without them ever actually knowing your email address. Very slick! Mixminion also added a couple of other features which are a little less interesting for this post but needed to make the protocol work well: a notion of directory servers, so that relays can rotate their keys often and clients can still know about them, and a notion of a "nym server", which is a server that accepts email to some address (like you@nym.server) and uses one of a cache of Single-Use Reply Blocks it has for you to fire a "reply" to you over the remailer network. That allows for email addresses that are really pseudonymous and really publicly mailable without someone needing to use mixminion themselves to do it, which is pretty darn cool. There's one more protocol (sort of) that deserves a mention here: Sphinx. While Mixminion and the previous protocols work by layering asymmetric encryption many times, this takes a bit of CPU to do and produces very long messages - eg, if your relay path has 10 hops, doing 10 layers of RSA would be a few kilobytes of overhead if not more. This is fine for email but rather unsatisfactory for smaller-scale stuff. One of the key insights of Sphinx is that one can use Diffie-Hellman to produce a shared secret with each relay in turn and then use symmetric encryption (only) to create the "onion" layers. At each layer, one computes a "blinding factor" for that layer, which is multiplied into the Diffie-Hellman public value at that layer. For example, if the blinding factors are b0, b1, ..., then the onion layers might have secrets: g ** x g ** (x * b0) g ** (x * b0 * b1) which allows reuse of the same single group value at each hop, with one blinding factor exponent being divided out each time. When you send, you can pre-compute what each layer's public exponent will be with its blinding factors divided out to know what the shared secrets will be, and use those to derive the keys used for the symmetric encryption. That explanation is probably very hard to follow; I recommend the Sphinx paper instead: => http://www0.cs.ucl.ac.uk/staff/G.Danezis/papers/sphinx-eprint.pdf or maybe this specification, regrettably from something cryptocurrency-related: => https://katzenpost.mixnetworks.org/docs/specs/sphinx.html That's it on this topic for now; I will write more in the future about Tor, Freenet, I2P, and other anonymity projects since I find the whole thing rather fascinating. Until next time! [1]: In the Year of Our Lord 2022, this is still often true. [2]: I realize that people don't only use anonymous mail to send terrible opinions, in the same way that people don't only use Tor to sell drugs. [3]: Of course, someone monitoring both legs would. [4]: But it could be linked by the two of them cooperating, or an adversary capable of monitoring all their connections - like, say, a large-ish government agency. [5]: This technique has a lot of fun uses. For further reading, take a look at Pond: => https://github.com/agl/pond [6]: It is notoriously easy to come up with a clever, seemingly-secure scheme here that still allows recovery of correlations via statistics.