One of the central problems in cryptography is "integrity protection", or
proving that some given data is as it was when it was generated or validated
by some trusted party. This matters especially for things like ensuring that
a software package wasn't tampered with, or that an email is not modified since
the sender produced it. Usually this is done using asymmetric signing schemes,
but these are slow and have large keys and require clever mathematics to
implement properly, so another option that is useful in some contexts is
symmetric signing schemes - i.e., ones where the same (private) key is used both
to apply the protection and to check that the protection is intact. These
schemes aren't useful for either of the cases given above: if a user has the
required key to check that a software release with a symmetric signature is
intact, then they also have the required key to forge a release of that
software, which is what security professionals call a "major bummer" when it
happens.

However, symmetric signatures can be quite useful in two scenarios:

1. The data is being returned to the point of origin but passing through
   untrusted hands en route, so the secret key is held and used only by one
   party;
2. The secret key is negotiated as part of some other protocol and used to
   provide transient protection for some kind of bulk data.

An example of a system of the first type might be a login server that issues
users "tokens". Assuming we have a symmetric signing operation:

  sign(key k, message m)
  verify(key k, message m)

The login server might support an operation like "login", which takes a username
and password, checks them against its database, and if successful returns:

  sign(LOGIN_SERVER_KEY, "logged in:" + user + time)

which can then be re-validated by the server at a later date using verify() with
that same key[1].

An example of a system of the second type is pretty simple to imagine: you and I
meet in person, let's say, and establish a shared secret key; I then transmit to
you a large file signed with that key and you verify it. Easy!

So, how does this symmetric signing actually work in practice? Well, there are
quite a few different schemes, but one of the most common and easiest to use,
HMAC, involves use of a hash function. Since hash functions are irreversible[2],
in theory, if I generate:

  hash(secretkey + data)

nobody else, observing that hash value, can re-derive the secret key, and
therefore nobody could produce hash(secretkey + otherdata). Admirably simply and
clear... but it doesn't actually provide the security guarantee you might want,
and as a result the actual HMAC standard, RFC 2104, has a more complicated
scheme involving two nested applications of the hash function.

Wait, though. Why doesn't it work? To understand, we need to dig a bit into how
most modern hash functions work.

Most hash functions are built from what is called a "one-way compression
function", which takes a previous state and a block of data and produces a new
state. The previous state and the block of data don't have to be the same size,
but they often are. For example, here's a one-way compression function with a
state size of 4 and a block size of 4:

  uint32_t compress(uint32_t state, uint32_t block) {
    return state ^ block;
  }

Obviously this is a garbage compression function for cryptographic purposes but
it'll serve to illustrate the point. From this compression function, we can
produce an actual hash function, which takes an arbitrary amount if input and
produces a fixed-length hash value of it, like so:

1. Pad the input (which is of arbitrary length) to a multiple of the block
   size;
2. Break the input into blocks of the block size, b[0] ... b[n]
3. Let the initial state s be some fixed value (the "initialization vector")
4. For each block b[i] of input, do: s = compress(s, b[i])
5. Extract a hash value from the state - this is called "finalization".

This construct is neat and simple to implement, and as a bonus it has some
pleasant provable security properties. However, it also has some major flaws,
including what is called the "length extension attack", which we'll describe
here, along with how HMAC prevents it.

In step 1, I somewhat breezily instructed you to "pad the input", and your first
question, as a security-conscious implementer, ought to have been "pad with
what?". Obviously simply padding with any fixed value p is insufficient, because
then messages ending or not ending in p will hash to the same value. What MD
schemes often do instead is what's called "length padding", in which the length
of the input is used as part of the padding. This clever scheme ensures that,
eg, hash("AAA") and hash("AAA\0"), if the padding character is \0, are different
since their lengths are different.

Let's call this function pad() and declare that it adds some proper padding to
the input. Now, in C, our finished hash function[3] might look like this:

  uint32_t hash(uint8_t *message, int len) {
    uint32_t state = IV;
    int i;

    for (i = 0; i < len - (len % 4); i += 4)
      state = compress(state, *(uint32_t *)(message + i));

    state = compress(state, pad(message + i, len));
    return state;
  }

Note that a padding block is always added to the end, even if the original input
was a multiple of the block size - in that case, the last block compressed in is
just the length and padding. It's often tempting, as our example hash does,
simply return the internal state as the final hash - e.g., in the now-broken
SHA-1, the finalize() function is basically just the identity function.

There is one curious property of this construct though: given an input message
m, the hash function returns its own internal state after hashing m with pad(m)
appended. Given this internal state, it is very possible to append more data and
receive a valid hash! For example, if I compute:

  hash("secretkey" + "message")

what is actually returned is basically:

  h = compress(compress(...(compress(IV, "secr"), "etke"), ..., pad(orig)))

it is very easy to then just compute:

  h' = compress(h, "zomg")

I could then append my own, new, padding block as required by the construct:

  h'' = compress(h', pad(new))

and it will be the case that:

  h'' = hash("secretkey" + "message" + pad(orig) + "zomg")

despite the fact that I don't know secretkey, I have produced an (admittedly
oddly-shaped) message apparently symmetrically signed with secretkey. Depending
on the actual application this may or may not be a real problem but it's
certainly concerning, and makes this signing construct difficult to use safely
without knowing more about how it's used.

That, of course, is no good in a security system, so what do we do? Well, the
real problem here is that the internal state of the hash is returned such that
an attacker can simply continue to run the hash computation. We could use a
wider hash which we then "narrow" - eg, if I made the internal state a uint64_t
in my example and returned only the lower 32 bits of it, someone trying to forge
a message would have no way of reconstructing the upper 32 bits of that state
and therefore wouldn't be able to extend the hash. This is called the "wide
pipe" approach to fixing length extensions, but it really has to be designed
into the hash function at birth, and we want to be able to securely use our
existing hash functions if we can.

Thus, enter HMAC. HMAC addresses the problem of obscuring the hash's internal
state by computing:

  hmac(k, m) = hash((k xor o) + hash((k xor i) + m))

where o and i are constants, chosen just to ensure that the two hashes
definitely do start with different bytes. This approach absolutely prevents
length extension, because the state of the "inner" hash, which could be
length-extended, is hidden in the output of the "outer" hash and can't be
recovered, and the input to the "outer" hash is always of exactly fixed length
and can't be length-extended.

One could theoretically do:

  hmac(k, m) = hash(k + hash(k + m))

and the result would likely still be secure, but the use of the o and i
constants allows HMAC's security proof to be a special case of the security
proof of a more general construct called NMAC:

  nmac(k0, k1, m) = hash(k0 + hash(k1 + m))

You can read more about that, along with all the equations you could ever
desire, in a PDF with sha256sum
69013ff255612c4f18e9ea07a7a9dd46a3ee8b71553650292777d2774c2046ad.

Anyway, that's what a length extension attack is and how the HMAC nested-hash
construct avoids it!

[1]: It would be a very useful lie to say that this is how Kerberos works, and
     through a very specific lens it is, but it uses a symmetric cipher
     (regrettably often something like DES) to provide both integrity
    protection and confidentiality. Yikes.

[2]: Meaning that given hash(m), you cannot easily re-derive m. This is formally
     called a "first preimage attack". The value of "easily" is, of course,
     subjective.

[3]: This function obviously is unsuitable for real use. Please do not email
     me with cryptanalyses of it.

$#t Length Extension Attacks
$#s How length-extension attacks work against Merkle-Damgard hash functions
$#o crypto
$#u c477f106-2c98-45f5-a51d-14495ac36716