Cypherpunks, Mixmaster, and Mixminion: Anonymous Email

For as long as email has existed, people have wanted to send email anonymously.
This post describes the evolution of email anonymity and anonymous message
networks in general. It touches on some of the same techniques often used for
anonymous packet networks (like Tor) but this post is not about Tor.

So, let's suppose that you are you@mail.host and you would like to send mail
to a mailing list - let's call it controversial@mail.host. Obviously, you can
simply send an email to that list, but SMTP includes the From: address, which
will tell everyone who sent it, and you may get in trouble, get ostracized,
and generally experience the consequences of your actions. In fact, since SMTP
is a cleartext protocol[1], your email will likely get monitored in transit and
you may experience such consequences even if nobody actually on the list cares
at all about what you wrote.

That's pretty rough. What can we do about it? One idea is to simply supply a
fake "From:" address in your email, but SMTP is a pretty information-rich
protocol and includes a bunch of other fields like, e.g., the IP address you
sent mail from which are a lot harder to forge - assuming you can find a mail
server willing to accept arbitrary From: addresses anyway.

One obvious way around this is what's called a "remailer", which takes email
from anywhere and re-sends the mail as though it had come from the remailer
itself. For example, your original mail might be:

  From: you@mail.host
  To: re@mailer.site

and in the body itself you could indicate something like:

  Destination: controversial@mail.host

The remailer could then strip the destination off and mail the rest of the body
as though it had come from the remailer itself, i.e.:

  From: re@mailer.site
  To: controversial@mail.host

and the remailer could handle replies to your mail (by forwarding them back to
you) and so on.

This is all well and good, especially if you use GPG or something similar for
the mail body, since then the remailer doesn't get to see the text of the mail
you're sending. In fact, if the remailer has its own GPG key, you could actually
encrypt the mail twice, like this:

  encrypted_message = encrypt(your_key, recipient_key, real_message)
  wrapped_message = encrypt(your_key, remailer_key,
                            "Destination: ..." + encrypted_message)

or, using layer of '*' to show encryption layers, we might write:

  From: you@mail.host
  To: re@mailer.site
  * Destination: controversial@mail.host
  ** I have a terrible opinion: ...[2]

so that someone monitoring the remailer's incoming mail, or your outgoing mail,
won't know where your message was supposed to go[3]. Upon receiving your mail,
the remailer decrypts it with its own key, stripping off one "layer" of
encryption and getting:

  From: you@mail.host
  To: re@mailer.site
  Destination: controversial@mail.host
  * I have a terrible opinion: ...

Note that the second layer of encryption cannot be decrypted by the remailer,
since you encrypted it with the desired eventual recipient's key, not the
remailer's. This is all well and good.

However, there is a problem: you are still trusting the remailer totally. The
remailer knows both who you are and who you are mailing - it has to, because it
is in direct contact with both you and the eventual recipient.

You can avoid *that* by chaining remailers together, and in fact this scheme
allows exactly that, by adding more layers of encryption. For example, if we
have two separate remailers re@mailer1.site and re@mailer2.site, we could do
this:

  From: you@mail.host
  To: re@mailer1.site
  * Destination: re@mailer2.site
  ** Destination: controversial@mail.host
  *** I Have a terrible opinion: ...

Here, re@mailer1.site strips off the first "layer" and forwards the result to
re@mailer2.site, which strips off the second "layer", revealing the ultimate
destination, and ultimately sends:

  From: re@mailer2.site
  To: controversial@mail.host
  * I have a terrible opinion: ...

This technique is called "onion encryption" because of the layered structure,
and it is the same approach that Tor and many other protocols use for producing
hop-by-hop anonymity. In this system, re@mailer1.site knows who you are (because
you send the mail directly to it) but not where the mail is going and
re@mailer2.site knows where the mail is going but not who sent it; the sender
and receiver cannot be "linked" by either one of the remailers[4].

This exact system, more or less, is called a "type 1" email anonymity system.
While it does work, and is extremely easy to deploy and interoperate with
existing mail systems, it has a couple of flaws that make it very easy to
attack.

Flaw #1: vulnerability to traffic analysis. Removing a layer of encryption
decreases the size of a message by a pretty fixed amount (the overhead of
whatever encryption is in use), so if someone is monitoring mailer1.site's
connection, they can probably guess that your incoming message corresponded to a
specific outgoing message, because the sizes of the two will be very well
correlated. Naive implementations also have strong time correlation, meaning the
next message to leave the relay has a pretty good chance of being the one you
just sent.

This is pretty bad on its own with just an adversary monitoring mailer1.site,
but what if an adversary is monitoring both mailer1.site and mailer2.site? Then
they can actually "link" the message the entire way using either timing or
message sizes, which is pretty bad news for your terrible opinion.

Flaw #2: replays. Anyone who wants to can send new copies of your original
message to re@mailer1.site, and it will unwrap them and forward them on exactly
as before. This can let adversaries do things like, e.g., wait for times that a
relay is less busy and then send a duplicate of your message to make timing
correlation easier to do. This *can* be prevented, but it requires care, and
attackers can often just wait "long enough" to re-send your message, unless the
remailers remember messages they've already forwarded for all eternity.

Fortunately, it is possible to design around these two flaws. To defeat replays,
we can require that messages be below a certain age and require relays to retain
seen message IDs younger than that age only. Defeating traffic analysis requires
a few different techniques, but the two biggest ones are:

1. Break messages into fixed-size chunks, and route these chunks around the
   network. For example, if your email is 1500 bytes and the network's chunk
   size is 1024, it gets padded to a multiple of the chunk size (2048) then
   split into two chunks to be sent. Since adversaries don't know what is in any
   given chunk, they can't tell how long the original message was - just that n
   chunks are being sent between a pair of relays. With a bit of cleverness,
   they can't know which chunks are part of which messages. As you strip layers
   of onion off, you add new padding to ensure that the message stays exactly
   the same length, and especially the same number of chunks.

2. Break up the time correlation on messages. This unavoidably adds latency to
   the system, but for privacy purposes this is a Good Thing. A great deal has
   been written about how to do this, and there are many clever approaches, but
   a wasteful but relatively bulletproof one is actually to give the network a
   fixed transmit rate[5] - i.e., re@mailer1.site sends n chunks to
   re@mailer2.site every minute, regardless of whether it has data to send or
   not; the ones that are not real data are dropped by re@mailer2.site after it
   decrypts them. This defeats analysis of message timing[6].

The combination of these two techniques, and some other improvements to the
protocol, yields the "type 2" or "mixmaster" type remailers, named after one of
the main implementations of this approach.

=> https://datatracker.ietf.org/doc/html/draft-sassaman-mixmaster-00

However, there is another flaw waiting in the wings, and this one is harder to
fix.

Flaw #3: tagging. This is a more insidious attack. You actually can't digitally
sign your messages at any of the "outer" layers of the encryption, because if
you did, you would have to use a key that the remailers knew to be able to
validate the signature in the first place. If you did that, you'd be telling the
relays who you are, which defeats the entire point. Obviously you can sign the
innermost message with some appropriately pseudonymous GPG key which your
recipient is aware of, but signing the outer layers is impossible. That leads to
the following nefarious idea: an attacker can flip a bit (or several bits) in
the *payload part* of the message as it is sent from you to re@mailer1.site -
easy enough to do, since SMTP is generally plaintext. This will make the message
eventually decrypt to gibberish, of course, but re@mailer1.site has no idea
about that - all it does is strip off a layer of encryption and pass the
message. The second remailer does the same, and now a nonsense message is sent
to controversial@mail.site. If your adversary is monitoring both you and
controversial@mail.site, your incoming message (which they flipped bits in) and
the outgoing corrupted message are now pretty strongly linked. Eek!

This problem actually turns out to be really annoying to fix. We can't sign the
outer layers to make them tamper-evident, so the relays can't drop corrupted
data mid-stream, and allowing the corrupted message to reach the end of the
relay chain is exactly what allows the attack in the first place.

One approach is to actually encrypt part of the routing data together with the
payload message, using a block cipher with a very large block size - ideally,
the size of the entire messages. Good block ciphers have the property that, if
you flip even one bit of the input, every bit in the output changes with
probability 1/2. Let's imagine we have two functions:

  AE(k, m): asymmetrically encrypt m to recipient public key k
  SE(k, m): symmetrically encrypt m with secret key k, with a wide block cipher

and suppose we're going to use three relays, with public keys Kr0, Kr1, and Kr2,
and we're ultimately trying to mail a destination d with key Kd.

In our old, tagging-vulnerable scheme, we were sending:

  AE(Kr0, "r1" + AE(Kr1, "r2" + AE(Kr2, "d" + AE(Kd, m))))

Since all those asymmetric encryptions are unauthenticated (there's nothing for
the relays to authenticate them with), bits can be freely flipped in the outer
message's payload part and those flips won't cause a problem until the last leg.

However, if we instead generate a temporary symmetric key Kt and do this:

  AE(Kr0, "r1" +
    AE(Kr1, "r2" + Kt +
      SE(Kt,
        AE(Kr2, "d" + AE(Kd, m)))))

now, when the message reaches the middle relay r1, r1 does an extra step: it
strips a layer of symmetric encryption off both the message *and* the rest of
the routing data. Since the block cipher has this neat property that any bit
flip is likely to destroy the message, any modification of the message before it
reaches r1 is likely to leave the rest of the routing data invalid after r1
unwraps it, and r1 can drop it. An attacker can still tag messages going into
the network, but they will be dropped halfway through and not emerge as
detectable corrupt messages.

This approach, more or less, is what the "mixminion" mail relay does, and this
is known as a "type 3" relay. Mixminion also moves away from SMTP as a transport
for packets being relayed and towards a protocol running over TLS, so it's much
harder for attackers to either see or modify messages in transit throrugh the
network.

You can read more about Mixminion here:

=> https://www.mixminion.net/minion-design.pdf

Mixminion has one other fascinating feature, which I mentioned above briefly:
pseudonymous replies. In the earlier designs it was impossible to actually reply
to a pseudonymous message and have the reply be delivered, since the mix network
would have no idea where to send the reply to without you having some public
email address to receive it at - and if you had that address and included it,
you wouldn't be anonymous to begin with!

Mixminion's approach to that is what's called a "Single-Use Reply Block": when
sending a message, you can include *just the routing information* needed to send
a reply to you, onion-wrapped as usual, and leave the message body empty. The
recipient of your message can, through a bit of cryptographic trickery that I've
papered over above, put their message into that empty "slot" and send it to you,
and it will traverse the remailer network back to you exactly as if they had
pseudonymously emailed you - but without them ever actually knowing your email
address. Very slick!

Mixminion also added a couple of other features which are a little less
interesting for this post but needed to make the protocol work well: a notion of
directory servers, so that relays can rotate their keys often and clients can
still know about them, and a notion of a "nym server", which is a server that
accepts email to some address (like you@nym.server) and uses one of a cache of
Single-Use Reply Blocks it has for you to fire a "reply" to you over the
remailer network. That allows for email addresses that are really pseudonymous
and really publicly mailable without someone needing to use mixminion themselves
to do it, which is pretty darn cool.

There's one more protocol (sort of) that deserves a mention here: Sphinx. While
Mixminion and the previous protocols work by layering asymmetric encryption many
times, this takes a bit of CPU to do and produces very long messages - eg, if
your relay path has 10 hops, doing 10 layers of RSA would be a few kilobytes of
overhead if not more. This is fine for email but rather unsatisfactory for
smaller-scale stuff.

One of the key insights of Sphinx is that one can use Diffie-Hellman to produce
a shared secret with each relay in turn and then use symmetric encryption (only)
to create the "onion" layers. At each layer, one computes a "blinding factor"
for that layer, which is multiplied into the Diffie-Hellman public value at that
layer. For example, if the blinding factors are b0, b1, ..., then the onion
layers might have secrets:

  g ** x
  g ** (x * b0)
  g ** (x * b0 * b1)

which allows reuse of the same single group value at each hop, with one blinding
factor exponent being divided out each time. When you send, you can pre-compute
what each layer's public exponent will be with its blinding factors divided out
to know what the shared secrets will be, and use those to derive the keys used
for the symmetric encryption.

That explanation is probably very hard to follow; I recommend the Sphinx paper
instead:

=> http://www0.cs.ucl.ac.uk/staff/G.Danezis/papers/sphinx-eprint.pdf

or maybe this specification, regrettably from something cryptocurrency-related:

=> https://katzenpost.mixnetworks.org/docs/specs/sphinx.html

That's it on this topic for now; I will write more in the future about Tor,
Freenet, I2P, and other anonymity projects since I find the whole thing rather
fascinating.

Until next time!

[1]: In the Year of Our Lord 2022, this is still often true.

[2]: I realize that people don't only use anonymous mail to send terrible
     opinions, in the same way that people don't only use Tor to sell drugs.

[3]: Of course, someone monitoring both legs would.

[4]: But it could be linked by the two of them cooperating, or an adversary
     capable of monitoring all their connections - like, say, a large-ish
     government agency.

[5]: This technique has a lot of fun uses. For further reading, take a look at
     Pond:

=> https://github.com/agl/pond

[6]: It is notoriously easy to come up with a clever, seemingly-secure scheme
     here that still allows recovery of correlations via statistics.

$#t Cipherpunks, Mixmaster, and Mixminion: Anonymous Email
$#s An overview of email anonymization tools and attacks against them
$#o crypto, privacy
$#u 6a0469a7-7e8f-455e-9217-237dcf7b84cd