Constant-Time Operations

It's often pleasing to think of a piece of software as an entirely abstract
thing: it takes inputs, it does some work, it produces outputs, we all move on
with life. This way of looking at things is extremely useful for formal
reasoning about programs, but unfortunately it is a crude fiction for children.
All of our software ultimately executes on physical hardware, and physical
hardware has a very nasty way of destroying neat theoretical models[1]. In
particular, the twin demons of a great deal of security-sensitive software are
timing and power.

The timing problem is simple to explain. Suppose[2] that I have been handed some
data and an alleged HMAC of it, and I wish to verify that HMAC against a secret
key I have. I might write this function:

  int hmac_ok(char *secretkey, char *data, char *alleged) {
    char good[HMACLEN];
    hmac(secretkey, data, good);
    return memcmp(good, alleged, HMACLEN) == 0;
  }

internally, memcmp() is implemented something like this[3]:

  int memcmp(char *a, char *b, int len) {
    int i;
    for (i = 0; i < len; i++) {
      if (a[i] != b[i])
        return a[i] < b[i] ? -1 : 1;
    }
    return 0;
  }

on paper this looks quite proper: as long as we check that the alleged buffer is
indeed HMACLEN long, we should be quite alright. However, there's a lurking
problem. If we call memcmp() on two strings that differ in only the first byte,
the CPU will do only a single compare before returning. If we call it on two
strings that have the same first byte, but differ in the second byte, it will do
a compare, a jump, and another compare before returning. Since all these
operations take actual time on actual hardware, we have what's called a "side
channel": the software is leaking information about its internal state through
some means other than its actual interface[4]. In particular, here it's leaking
which byte of the HMAC first mismatched, which is very important to an attacker!
They can now try 256 different HMAC values with different first bytes, and
whichever one is slower than the other 255 has the correct first byte. Using
that now-known first byte they can try 256 different HMAC values with that first
byte and a different second byte, check which of those is slowest, ... and so
on.

But wait: is this a real problem? These compares take like, one zeptosecond on a
modern CPU, and there's so much else going on that introduces random delays that
you couldn't possibly actually measure this kind of thing... right?

Regrettably, it depends on a lot of things. Even relatively small differences in
timing can be used by a remote attacker (over the network! with its own
millisecond delays!) thanks to the power of statistics. There is a very good
paper entitled "Remote Timing Attacks are Practical" from two Stanford
researchers in which they demonstrated recovering RSA private keys from other
systems over the Internet.  Even though the difference in an individual run's
time might be miniscule, over many iterations it introduces a detectable bias
into the distribution (statistically) of runtimes, which can be enough for an
attacker to recover bits of your state. That said, in this day of cloud
services, it's often possible for an attacker to be executing *on the same
machine* as you, just in a different VM, in which case they have access to very
high-precision timing data indeed. 

So, what to do? Well, for memcmp, we can write a constant-time analog that is
good enough for our purposes:

  int constant_memcmp(char *a, char *b, int len) {
    char r = 0;
    int i;
    for (i = 0; i < len; i++)
      r |= a[i] ^ b[i];
    return r;
  }

Here, we're taking advantage of the fact that a ^ b is 0 if a and b are equal,
but computing a ^ b takes the same time regardless of the values of a and b -
this property is called "data independence". As we walk both strings, we xor
together each matching pair of bytes, then or that into r, which also takes
constant time. The end result is that if any pair of bytes a[i] and b[i]
differed, a[i] ^ b[i] will be nonzero, so at least one bit will be ored into r
and returned, and we thus return a nonzero value when the strings aren't equal
and a zero value when they are.

However, this isn't such an easy problem to fix in general. Most bitwise
operations are data-independent in this way on modern hardware, but (eg)
conditional branches never are, as are some surprising operations like division.
One can design cryptographic code to avoid data dependence, but it's certainly
not easy, and doing so for (eg) RSA is an open research problem. The NaCl crypto
library is an example of one designed to avoid data dependence, but it requires
essentially custom cryptographic primitives to do so.

I mentioned them briefly above, but power side channels deserve another mention:
while they aren't *usually* practical as an attack vector on either servers or
consumer hardware, since those devices have wildly unpredictable power draw and
in any case it's usually much harder to smuggle an ammeter into a target's home
slash server rack, they are a real concern for secure embedded hardware. For
example, TPMs and other such enclaves *do* have to worry about an attacker
measuring minute fluctuations in their power draw to reconstruct their internal
state. Worse, power is not just an output but an input, and it is often possible
to induce interesting misbehavior in chips by varying their input voltages...
but that's a topic for a different post.

[1]: Examples are legion, but three prominent ones are the "rowhammer" genre of
     DRAM data corruption attack, the "spectre" genre of branch-predictor attack
     (another side-channel!), and the impossibility of securely erasing disks
     in practice because of intelligent sector remapping. It's rough out there
     for theorists.

[2]: An even *more* obvious example would be checking a password against a
     database, but at this point everyone knows you have to hash passwords,
     and hashing passwords removes some of the value of the timing attack here
     since it is only the hash that could be recovered. Password hashes are
     still useful because you can subject them to offline attack, but it would
     muddle the point I'm trying to make.

[3]: It is not, of course; real memcmp()s work in larger blocks and are
     usually handwritten assembly.

[4]: If you wanted to you could consider all these side channels part of "the
     interface", but enumerating what they actually are can turn into quite a
     hobby - e.g., TEMPEST-type attacks on CRTs, or the sidechannel EM noise
     most keyboard mechanisms emit in use. Ugh!

$#t Constant-Time Operations
$#s In cryptography, what are constant-time operations and why do they matter?
$#o crypto, programming
$#u 2df192ce-3463-4e15-9951-72e39285da26