/* X.509 Certificates * This blog post is a C program - you can compile it with: * cc -o post post.c -lbearssl * It requires bearssl to compile and run. * * If you've ever set up a web server with https support, you've probably at * some point been called upon to interact with cryptic files with names ending * in .pem, which contain blocks of base64ed text. I've seen these a bunch of * times myself and always vaguely known that there's like, a public key and a * signature in there, but not really what's going on in any detail. I decided * to find out, by setting out to write a program that would generate a new * certificate good enough to be understood by OpenSSL. This is that program. * * We're gonna use bearssl for the actual crypto stuff (of which there won't be * all that much) because the details of how to generate elliptic curve keys and * such aren't germane to what we're doing. I did also rampantly ignore errors * and other failure modes, so maybe don't use this in production. * * Alright, let's get right into the C program part with some header files! */ #include #include #include #include #include #include #include /* It wouldn't be a real blog post without some seriously budget error handling, * so here we are: */ void ouch(const char *msg) { fprintf(stderr, "%s\n", msg); exit(1); } /* And a low-rent debugging aid, too: */ void printhex(const char *pre, const uint8_t *bytes, size_t len) { const size_t LINE = 32; for (size_t i = 0; i < len; i++) { if (!(i % LINE)) { if (i) fprintf(stderr, "\n"); fprintf(stderr, "%s:", pre); } fprintf(stderr, " %02x", bytes[i]); } fprintf(stderr, "\n"); } /* Alright, on to the main stuff :) * * X.509 is a standard that defines a particular widely-used digital certificate * format, used for TLS (so HTTPS), S/MIME (so one type of encrypted email), and * a lot of other uses. * * X.509 heavily uses a data encoding called ASN.1 ("Abstract Syntax Notation * 1"), which is a way of serializing and deserializing messages, like a very * old and complicated predecessor to JSON or Protocol Buffers. In X.509, * certificates, cryptographic keys, and a bunch of other things are defined as * ASN.1 messages. ASN.1 has a bunch of possible encodings; the one that X.509 * uses is called "DER" (Distinguished Encoding Representation), which is * deliberately as unambiguous as possible, meaning messages can generally only * be encoded in a single way. This is intended to make it easy to check * signatures on messages, since a given message should always serialize to the * same bytes. * * Since blobs of binary data are a bit unwieldy, especially for sending over * email, there's a commonly-used standard called PEM ("Privacy-Enhanced Mail") * which defines a way to encode... well, anything, but most often ASN.1 * messages for transport over email. To do that, they base64 the data payload * and wrap it in a header/footer to make it easy to extract from a larger * message. * * In fact, let's look at an example PEM file. If you were to generate a new * certificate and export it as a PEM file it might look like this: * * -----BEGIN CERTIFICATE----- * MIIBWzCB4gIUEjf0/5Z1iEtGwdVVtpwEVqnItQ4wCgYIKoZIzj0EAwIwEjEQMA4G * A1UEAwwHdGVzdGluZzAeFw0yNDAzMDIyMzEzMjVaFw0yNDA0MDEyMzEzMjVaMBIx * EDAOBgNVBAMMB3Rlc3RpbmcwdjAQBgcqhkjOPQIBBgUrgQQAIgNiAASdNB7Jt9ni * 0NSruQhTGoNyQVjuVLvQAGKITVuUwDd4XwzXToE1FZc6RgAo4tMX5uMh2VIQwxNo * BJ5qc7bXwJFeam0j9MKHjDmPhmhWg+3HKO3WK4Ihr467A6DvESybjpYwCgYIKoZI * zj0EAwIDaAAwZQIwLw3igZeDUJz0FdnrClommvBzBJ7tn8JkzAl9TrO007vg3fVe * 2IcIHA7qsxUqEwcvAjEAw21lCAboqosIwfJnGY3T/byfp97HzXcl4+JCUHyiQ6mD * Fzajy+EWuCSWeDpxznk0 * -----END CERTIFICATE----- * * This was generated by: * openssl genpkey -algorithm EC -out key.pem \ * -pkeyopt ec_paramgen_curve:P-384 \ * -pkeyopt ec_param_enc:named_curve * openssl x509 -new -subj /CN=testing/ -key key.pem -out cert.pem * * And the corresponding key.pem looks like this: * -----BEGIN PRIVATE KEY----- * MIG2AgEAMBAGByqGSM49AgEGBSuBBAAiBIGeMIGbAgEBBDAOs2FsRLcx6W8yUktn * gPmo1X+IkJ/lGlxh4wVLzrn45TTy3YrzDpCc/51zx4VC/GihZANiAASdNB7Jt9ni * 0NSruQhTGoNyQVjuVLvQAGKITVuUwDd4XwzXToE1FZc6RgAo4tMX5uMh2VIQwxNo * BJ5qc7bXwJFeam0j9MKHjDmPhmhWg+3HKO3WK4Ihr467A6DvESybjpY= * -----END PRIVATE KEY----- * * If we were to un-base64 the lines in between the BEGIN and ENDs there, we'd * indeed see ASN.1 messages, although we wouldn't (yet!) know how to interpret * them. * * Let's start with an explanation of (part of) ASN.1, since it's so central to * what we're doing here. */ /* Regrettably the standard for ASN.1 is produced by the ITU (International * Telecommunication Union) as X.690, and is one of the most opaque standards * documents I've ever read: * * => https://www.itu.int/rec/T-REC-X.690-202102-I/en * * ASN.1 is extremely complicated and has lots of variants and encodings and * stuff, but we only have to care about DER, which attempts to provide a * single fairly compact encoding for any given message. At its core, ASN.1 in * DER and ignoring all of the bizarre options is a self-describing[fn1] * type-length-value scheme that looks like this: * tag: 1 byte [fn2] * length: variable * payload: bytes * where the length is either in "short form" (up to 0x7f, with the high bit * clear) or "long form" (high bit set, low 7 bits encode how many bytes are * used for the length (!)). * * If you're wondering if it's annoying that the length is of variable size: it * is. Until you've encoded the payload (which usually involves encoding nested * messages) you don't know how long the length is going to need to be, so * you're forced to either do a first pass where you size everything then a * second pass where you actually encode, or to guess how big the size fields * are and then adjust if necessary. The code below does the first sizing pass, * but it's very annoying because you have to buffer the whole message in * memory. * * There's one other important and fundmental concept we need to know about: * OIDs, or "object identifiers". Lots of things in both ASN.1 and X.509 are * identified by "object IDs", which are basically a hierarchical tree[fn3] of * namespaces indexed by integers. An OID is usually written as a dotted * sequence of unsigned integers, like 1.2.36.1 or whatnot, which means: * * 1 (= ISO) * . 2 (= member body) * . 36 (= australia) * . 1 (= government) * * so 1.2.36.1 is the object identifier for the Australian government, or * 1.2.840.10045.2.1 which identifies elliptic-curve public keys via: * * 1 (ISO) * . 2 (member body) * . 840 (US) * . 10045 (ANSI X9.62, the ECDSA standards body) * . 2 (keyType) * . 1 (ecPublicKey). * * There are OIDs for types, for cryptographic algorithms, for standards bodies, * for places, for... you get the idea. There are a lot of them. * * Here, we use an array of 16 ints, and terminate the OID with a -1, since the * components of OIDs are always nonnegative. The components are allowed to be * arbitrarily big, and the OIDs arbitrarily long, but this works for our use * cases. */ typedef int oid[16]; /* And here are the OIDs we'll be needing here, stored as described above. These * values come from: * 1.2.840.10045.2.1: called "id-ecPublicKey" in RFC 3278, section 8.1 * 1.3.132.0.34: RFC 5480 2.1.1.1, set of named curves * 1.2.840.10045.4.3.2: RFC 5480 2.1.2; this document is also authoritative * for what the various parameters are you need to use * for crypto types in X.509! * 2.5.4.3: from ITU-T X.520 */ const oid OID_ECPUBKEY = { 1, 2, 840, 10045, 2, 1, -1 }; const oid OID_SECP384R1 = { 1, 3, 132, 0, 34, -1 }; const oid OID_ECDSA_SHA256 = { 1, 2, 840, 10045, 4, 3, 2, -1 }; const oid OID_COMMON_NAME = { 2, 5, 4, 3, -1 }; /* Here are the ASN.1 values types we support encoding, although there are a lot * of other ones (many types of strings[fn5], reals, other composite types, * etc): */ enum asn1type { ASN1_INTEGER = 0x02, ASN1_BITSTRING = 0x03, ASN1_OCTETSTRING = 0x04, ASN1_OID = 0x06, ASN1_UTF8STRING = 0x0c, ASN1_SEQUENCE = 0x30, /* For reasons that are absolutely obscure to me, X.690, the official * standard for ASN.1, does not actually say what the tag values are for * these types. It is completely baffling - the documentation about how * to encode them is simply silent on the matter. I figured these out by * looking at certs generated by OpenSSL. */ ASN1_UTCTIME = 0x17, ASN1_GENERALIZEDTIME = 0x18, ASN1_SET = 0x31, }; struct asn1 { enum asn1type type; /* The length is always present for some types; for others it needs to * be computed depending on the value. Lack of a computed length is * marked a negative length. */ ssize_t len; union { /* This value is a primitive type, in which case it has a * definite length and this field points to a buffer containing * its encoded form. */ uint8_t *b; /* If this value is a composite type (a SEQUENCE or SET or * whatever), this points to its first child value; subsequent * children are linked through the next field on the children. */ struct asn1 *l; } val; /* If this value is part of a list, this field is used to link to the * next element in that list. */ struct asn1 *next; /* Flags for internal use of the ASN.1 encoder. */ unsigned int flags; }; enum { /* A 'struct asn1' is finished if its value is complete and its length * is known. Primitive types are always finished, composite types have * to be explicitly finished via asn1_finish() to indicate that no more * children will be added. */ ASN1F_FINISHED = 0x01, }; /* And here are our primitives for managing them. We're going to rampantly use * heap allocations here to keep the code as simple as possible, but in a real * X.509 library you probably wouldn't want to do this. * * The implementations of those functions are down below, since they're not * interesting - they just make a new asn1 object and fill in some of its * fields. */ struct asn1 *asn1_new_int(uint64_t val); struct asn1 *asn1_new_bigint(const uint8_t *bytes, size_t len); struct asn1 *asn1_new_oid(const oid oid); struct asn1 *asn1_new_bitstring(const uint8_t *bytes, size_t len); struct asn1 *asn1_new_octetstring(const uint8_t *bytes, size_t len); struct asn1 *asn1_new_utf8string(const char *str); struct asn1 *asn1_new_utctime(time_t time); struct asn1 *asn1_new_generalizedtime(time_t time); struct asn1 *asn1_new_sequence(); struct asn1 *asn1_new_set(); /* This is a nasty hack - some fields in X.509 certs are actually themselves * ASN.1 values that are encoded, then have their encoded form wrapped in a * bitstring. When we do this, we serialize the child immediately rather than * delaying serializing it! */ struct asn1 *asn1_new_bitstring_wrapping(struct asn1 *child); struct asn1 *asn1_new_octetstring_wrapping(struct asn1 *child); void asn1_add_child(struct asn1 *parent, struct asn1 *child); int asn1_is_primitive(const struct asn1 *asn1); void asn1_free(struct asn1 *a); /* When we're done adding children to a node, we need to finish it - this also * computes the node's length. */ void asn1_finish(struct asn1 *node); size_t asn1_serialize(const struct asn1 *root, uint8_t *buf, size_t len); uint8_t *asn_serialize_alloc(const struct asn1 *root, size_t *len); /* And we'll need one helper function, which will tell us how many bytes the * entire encoded form of an asn1 value will take, including its tag and * (variable length) length: */ size_t asn1_encoded_length(const struct asn1 *root); /* And last (but definitely not least), we're going to want the ability to emit * an ASN.1 message as a PEM file, so that we can actually emit the generated * certificate and key. This function will do the base64 encoding and wrap the * block of base64 text in PEM-style BEGIN/END lines. The exact format is * defined in RFC 7468, although it's quite simple: * * => https://www.rfc-editor.org/rfc/rfc7468.txt */ void asn1_emit_pem(const struct asn1 *asn1, const char *tag); /* Alright, let's get to implementing the more complicated ASN.1 stuff - * remember that the various asn1_new_*() functions and asn1_add_child() are * implemented down below. */ /* This function takes an ASN.1 message and encodes it into a given buffer, * returning how long the message is. Passing a NULL buffer causes it to return * the size without actually serializing anything. */ size_t asn1_serialize_len(const struct asn1 *root, uint8_t *buf, size_t len) { /* The ASN.1 length encoding is described in X.690 8.1.3.5. There are * two forms: the short form which uses a single byte <= 0x7f, and the * long form in which: * - the high bit of byte 0 is 1 * - the low 7 bits of byte 0 encode the length of the length * - the remaining n bits encode the length */ /* Every length is going to require at least one byte to serialize... */ assert(!buf || len > 0); if (root->len < 0x80) { if (buf) buf[0] = root->len; return 1; } /* Now figure out how many bytes the length will take: */ size_t lenlen = 0; size_t dlen = root->len; while (dlen) { dlen = dlen >> 8; lenlen++; } if (!buf) return 1 + lenlen; assert(len >= 1 + lenlen); /* Set the high bit of the length to indicate that this is in long form, * and encode the rest of the length's length in the low bits. */ buf[0] = 0x80 | lenlen; dlen = root->len; /* And write the bytes of the length out, backwards so that the lowest 8 * bits of it are at the end. */ for (size_t j = lenlen; j > 0; j--) { buf[j] = dlen & 0xff; dlen = dlen >> 8; } return 1 + lenlen; } /* This serializes a given ASN.1 message into a given buffer, which must be long * enough to receive the message. It does that by, if the message is a * primitive, simply copying its encoded bytes into the buffer (since primitive * messages are stored encoded) or, if it is composite, then by encoding all of * its submessages into the buffer. The tag and computed length are always added * before the serialized message bodies in either case. */ size_t asn1_serialize(const struct asn1 *root, uint8_t *buf, size_t len) { assert(len >= asn1_encoded_length(root)); assert(root->flags & ASN1F_FINISHED); buf[0] = root->type; size_t lenlen = asn1_serialize_len(root, buf + 1, len - 1); if (asn1_is_primitive(root)) { /* Primitive -> just copy the already-encoded body in. */ memcpy(buf + 1 + lenlen, root->val.b, root->len); return 1 + lenlen + root->len; } else { size_t i = 1 + lenlen; /* Composite -> serialize all the submessages, keeping track of * how much buffer space we've used so far. */ for (struct asn1 *c = root->val.l; c; c = c->next) { /* Note: if we walk to the end of the buffer here, we * have to stop serializing! */ size_t left = len > i ? len - i : len; i += asn1_serialize(c, buf + i, left); } return i; } } /* This is a wrapper function which determines how long a message would be when * serialized, allocates a buffer of that size, serializes into it, then returns * the allocated buffer. */ uint8_t *asn1_serialize_alloc(const struct asn1 *root, size_t *len) { size_t l = asn1_encoded_length(root); uint8_t *buf = malloc(l); if (!buf) ouch("out of memory"); asn1_serialize(root, buf, l); *len = l; return buf; } /* This is also a wrapper function which returns the length a message would have * when serialized. */ size_t asn1_encoded_length(const struct asn1 *root) { assert((root->flags & ASN1F_FINISHED) && "length of unfinished node"); assert(root->len >= 0 && "unmarked length"); return 1 + asn1_serialize_len(root, NULL, 0) + root->len; } /* When we generate the actual certificate, we'll need to generate a keypair for * it, too, so that it can actually be used for anything. */ struct keypair { uint8_t sbuf[BR_EC_KBUF_PRIV_MAX_SIZE]; uint8_t pbuf[BR_EC_KBUF_PUB_MAX_SIZE]; /* These are filled in by bearssl as we generate the keys. */ br_ec_private_key s; br_ec_public_key p; }; /* The only thing we need to do with a keypair is generate it - signatures are * handled inline in x509_new_cert() below. */ void keypair_gen(struct keypair *kp); /* So now in theory all we need to do is: * 1. Create a `struct cert` and fill in all the fields; * 2. Serialize most of it (all the non-sig parts); * 3. Compute what the signature should be; * 4. Serialize the rest of it; * 5. Emit it wrapped up into a PEM. */ /* This function creates a new ASN.1 value mapping to a "common name" of the * supplied name. */ struct asn1 *x509_new_commonname(const char *name) { /* A bit about names in X.509: because X.509 gets its view of names from * X.500, names are extremely general. X.500 was based on the * (fundamentally misguided) idea that everyone would stick everything * into a single hierarchical namespace, and as a result it has an * extremely powerful idea of how to name things relative to other * things. Nobody uses even a fraction of this power, and instead what * people do[fn4] is generally jam a single field (called "common name") * in with a string name that has some meaning itself. * * However, because the mechanism is very general, the encoding is also * very general. A name is actually: * SEQUENCE OF { * SET OF { * SEQUENCE { * OID key; * ANY value; * } * } * } * where each element of the outer sequence is notionally relative to * the previous level, so this is called a "relative distinguished * name". We just construct a sequence of length 1, containing a set of * size 1, containing a sequence containing OID_COMMON_NAME and the * given name as a string. Woo. */ struct asn1 *outer = asn1_new_sequence(); struct asn1 *set = asn1_new_set(); struct asn1 *inner = asn1_new_sequence(); asn1_add_child(inner, asn1_new_oid(OID_COMMON_NAME)); asn1_add_child(inner, asn1_new_utf8string(name)); asn1_finish(inner); asn1_add_child(set, inner); asn1_finish(set); asn1_add_child(outer, set); asn1_finish(outer); return outer; } /* The X.509 time encoding rules are ridiculous. ASN.1 has two separate time * types, UTCTime and GeneralizedTime, where GeneralizedTime is basically * UTCTime with 4-digit years instead of 2-digit and with the option for * fractional seconds. Obviously, the correct thing to do is simply use * GeneralizedTime for all time fields, since UTCTime cannot represent a lot of * dates, and in a just universe, that is what would happen. * * Unfortunately, RFC 5280 4.1.2.5 requires the following unhinged behavior * instead: if the time's year is 2049 or less, the time is represented using * UTCTime, with the rule that if the year field in that UTCTime is >= 50 it is * interpreted as being 19YY, and otherwise it is interpreted as being 20YY, so * a year field of 87 means 1987 and a year field of 37 means 2037. If the * time's year is 2050 or more, instead one simply uses GeneralizedTime. This * insane hack with the UTCTime year field is local to X.509, too, so we can't * properly ask the ASN.1 serializer to sort it out for us. * * Oh, and to add insult to injury, these are represented on the wire as * strings. */ const unsigned long HOURS = 60 * 60; const unsigned long DAYS = 24 * HOURS; const unsigned long YEARS = 365 * DAYS; struct asn1 *x509_new_time(time_t time) { struct tm *tm = gmtime(&time); if (tm->tm_year >= 150) /* Thank goodness! */ return asn1_new_generalizedtime(time); if (tm->tm_year > 100) /* 2037 gets mapped to 1937, which then gets represented by * UTCTime as "37". Note that this will probably mess up the * exact day/time since years aren't all 100 * YEARS long, so, * you know, caveat emptor. */ return asn1_new_utctime(time - 100 * YEARS); else /* 1987 is just treated as 1987 and UTCTime represents it as * "87". */ return asn1_new_utctime(time); } struct asn1 *x509_new_algid() { /* An "algorithm" is identified by (not joking) an "algorithm * identifier", which is an OID, and also a set of parameters, * which are an ANY in the ASN.1 definition of this type. What * that ANY actually contains depends on the algorithm * identifier. * * For our particular case, we'll only support using elliptic * curve keys (OID 1.2.840.10045.2.1). We'll have to skip over * to RFC 3279 to find out what to put in the ANY: * * => https://www.rfc-editor.org/rfc/rfc3279.txt * * ... which says that the parameters are one of: an explicit * set of EC parameters, a specific "named curve", or * "implicitlyCA", which means that the parameters field is * null and the EC parameters are inherited from the CA. As * far as I know everybody uses named curves for this. A named * curve is itself referenced by an OID, so we store that here * too. */ struct asn1 *algid = asn1_new_sequence(); asn1_add_child(algid, asn1_new_oid(OID_ECPUBKEY)); asn1_add_child(algid, asn1_new_oid(OID_SECP384R1)); asn1_finish(algid); return algid; } /* Now we need to talk about what's actually inside those certificates. For that * we have to turn to RFC 5280, "Internet X.509 Public Key Infrastructure * Certificate and Certificate Revocation List Profile". * * => https://www.rfc-editor.org/rfc/rfc5280.txt * * and head down to section 4.1, "basic certificate fields", where we will find * two structures: Certificate and TBSCertificate. TBSCertificate contains all * the "certificate stuff", and then is wrapped up and signed to form the * Certificate, so the general flow is that: * 1. You make a TBSCertificate * 2. You send it[fn6] to a CA who checks whatever (your identity, your ownership * of the domain, whether your credit card payment has cleared, etc) * 3. The CA signs the TBSCertificate, producing a Certificate, and sends it * back to you * 4. You put the Certificate into your web server */ struct asn1 *x509_new_tbscert(const struct keypair *kp, const char *subject) { struct asn1 *tbs = asn1_new_sequence(); /* There is a version field right at the start, but because we're * leaving it at the default value (0, meaning X.509 v1), we omit it. */ /* cert serial number */ asn1_add_child(tbs, asn1_new_int(0)); /* the "signature" field, which is actually the algorithm id for the * signature on this certificate, so it has to match the one on the * outer certificate structure. In this case, it's represented as a * sequence with only one OID in it, although theoretically it could * have more parameters or whatever. */ { struct asn1 *alg = asn1_new_sequence(); asn1_add_child(alg, asn1_new_oid(OID_ECDSA_SHA256)); asn1_finish(alg); asn1_add_child(tbs, alg); } /* TODO: support issuer != subject */ asn1_add_child(tbs, x509_new_commonname(subject)); { time_t now = time(NULL); time_t notbefore = now - 30 * DAYS; time_t notafter = now + 10 * YEARS; struct asn1 *validity = asn1_new_sequence(); asn1_add_child(validity, x509_new_time(notbefore)); asn1_add_child(validity, x509_new_time(notafter)); asn1_finish(validity); asn1_add_child(tbs, validity); } asn1_add_child(tbs, x509_new_commonname(subject)); { struct asn1 *spki = asn1_new_sequence(); asn1_add_child(spki, x509_new_algid()); asn1_add_child(spki, asn1_new_bitstring(kp->p.q, kp->p.qlen)); asn1_finish(spki); asn1_add_child(tbs, spki); } /* If we were generating an X.509v3 certificate (we're not), we could * include a list of extensions here, which are a sequence of one or * more of: * * SEQUENCE { * OID id; * BOOL critical; * OCTETSTRING value; * } * * where the values are themselves DER-encoded ASN.1 messages whose form * depends on the extension id, and the critical flag means "you must * understand this extension to use this certificate". This mechanism is * used for a LOT of stuff, most notably including subject alt names * (used to allow the same cert for multiple hostnames, very common in * HTTPS setups) and "basicConstraints", which are what actually mark * certs as *not* being CA certs (!) in the HTTPS PKI. Yikes. */ asn1_finish(tbs); return tbs; } struct asn1 *cms_new_privkey(const struct keypair *kp) { /* RFC 5958, "Asymmetric Key Packages", specifies what it calls the * "Cryptographic Message Syntax" or CMS. That document specifies that * you can put a single PrivateKeyInfo between PEM "PRIVATE KEY" tags, * and that a PrivateKeyInfo is: * SEQUENCE { * INT version; * AlgorithmId algorithm; * OCTETSTRING privatekey; * ... and a couple of optional things we don't care about. * } * so the encoding looks pretty straightforward here, but you can go read * RFC 5958 if you want to learn all the gory details: * * => https://www.rfc-editor.org/rfc/rfc5958.txt * * However... what is the privatekey OCTETSTRING supposed to actually * be? RFC 5958 says "the algorithm identifier dictates the format of * the key" with no further elaboration. * * To find the actual answer, we have to go look at RFC 5915, "Elliptic * Curve Private Key Structure": * * => https://www.rfc-editor.org/rfc/rfc5915.txt * * Which says that the OCTETSTRING is meant to contain: * * SEQUENCE { * INTEGER version (1) * OCTETSTRING privateKey * ECParameters parameters OPTIONAL * BITSTRING publicKey OPTIONAL * } * * so all we need to emit is the version and the actual private key - if * we leave the public key out, it can be re-derived from the private * key anyway. Let's do it: */ struct asn1 *priv = asn1_new_sequence(); asn1_add_child(priv, asn1_new_int(0)); asn1_add_child(priv, x509_new_algid()); { struct asn1 *pki = asn1_new_sequence(); asn1_add_child(pki, asn1_new_int(1)); asn1_add_child(pki, asn1_new_octetstring(kp->s.x, kp->s.xlen)); asn1_finish(pki); asn1_add_child(priv, asn1_new_octetstring_wrapping(pki)); } asn1_finish(priv); return priv; } void sha256(const uint8_t *in, size_t inlen, uint8_t *out) { br_sha256_context ctx; br_sha256_init(&ctx); br_sha256_update(&ctx, in, inlen); br_sha256_out(&ctx, out); } /* Time to generate an actual signature! */ struct asn1 *x509_new_signature(struct asn1 *tosign, const struct keypair *kp) { const br_ec_impl *impl; br_ecdsa_sign signer; uint8_t hash[br_sha256_SIZE]; uint8_t sigbuf[512]; size_t siglen; /* Serialize the whole tosign message into a buffer, then sha256 that * buffer, then discard the buffer - we only need the hash for the rest * of this. */ { size_t len; uint8_t *buf = asn1_serialize_alloc(tosign, &len); sha256(buf, len, hash); free(buf); } impl = br_ec_get_default(); signer = br_ecdsa_sign_raw_get_default(); siglen = signer(impl, &br_sha256_vtable, hash, &kp->s, &sigbuf); /* An ECDSA signature is a pair of integers, called r and s. We get them * back from bearssl packed into a single buffer, padded with leading * zero bytes so that they're both the same length. The actual signature * format we need here is the ASN.1 signature format from RFC 3279 * section 2.2.3, which looks like: * SEQUENCE { r INTEGER, s INTEGER }; * so we'll now need to pull both of those values out of the raw * signature, encode them as ASN.1 integers, then pack them into an * ASN.1 sequence. * * Do note that bearssl can do this for us, via either: * br_ecdsa_sign_asn1_get_default() (which returns an ASN.1 signature) * br_ecdsa_raw_to_asn1() * but we'll do it by hand here - I wrote a partial ASN.1 encoder and * I'm dang well going to use it. */ struct asn1 *sig = asn1_new_sequence(); asn1_add_child(sig, asn1_new_bigint(sigbuf, siglen / 2)); asn1_add_child(sig, asn1_new_bigint(sigbuf + siglen / 2, siglen / 2)); asn1_finish(sig); struct asn1 *wrappedsig = asn1_new_bitstring_wrapping(sig); asn1_free(sig); return wrappedsig; } /* The cert itself is super simple: the tbscert, then an algorithm id for what * type of signature algorithm we used (ECDSA-with-SHA256, natch), then the * signature itself. Easy! */ struct asn1 *x509_new_cert(struct asn1 *tbscert, const struct keypair *kp) { struct asn1 *cert = asn1_new_sequence(); asn1_add_child(cert, tbscert); { struct asn1 *seq = asn1_new_sequence(); asn1_add_child(seq, asn1_new_oid(OID_ECDSA_SHA256)); asn1_finish(seq); asn1_add_child(cert, seq); } asn1_add_child(cert, x509_new_signature(tbscert, kp)); asn1_finish(cert); return cert; } void selftest(); int main() { struct keypair kp; selftest(); keypair_gen(&kp); struct asn1 *tbscert = x509_new_tbscert(&kp, "testing"); /* In theory, one could pass a different keypair here to sign the cert * with some other key; that would generate a cert signed by a CA. Here, * we're generating a self-signed cert instead. */ struct asn1 *cert = x509_new_cert(tbscert, &kp); /* Generate the private key in Cryptographic Message Syntax: */ struct asn1 *priv = cms_new_privkey(&kp); asn1_emit_pem(cert, "CERTIFICATE"); asn1_emit_pem(priv, "PRIVATE KEY"); /* Ta-da! Cert and private key generated, and ready to be slurped up by * any SSL library. */ return 0; } /* This function takes an ASN.1 message and emits it as a PEM message. To do * that, it has to serialize the entire thing, then base64 encode its body. */ void asn1_emit_pem(const struct asn1 *asn1, const char *tag) { printf("-----BEGIN %s-----\n", tag); /* What column to wrap the base64 text at; this must be divisible by 4 * or bad things will happen. If this was C11 we'd write: * enum { WRAP_AT = 64 }; * static_assert(!(WRAP_AT % 4)); * but it's not so here we are. */ const size_t WRAP_AT = 64; const char *B64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" "abcdefghijklmnopqrstuvwxyz" "0123456789+/"; size_t written = 0; size_t len; uint8_t *abuf = asn1_serialize_alloc(asn1, &len); uint8_t *buf = abuf; while (len > 0) { /* Every 3 input bytes turn into 4 base64 characters, and * missing bytes at the end are treated as being 0 for this * encoding. */ uint8_t d0 = buf[0]; uint8_t d1 = len > 1 ? buf[1] : 0; uint8_t d2 = len > 2 ? buf[2] : 0; /* aaaaaa aabbbb bbbbcc cccccc */ char c0 = B64[(d0 >> 2)]; char c1 = B64[((d0 & 0x3) << 4) | (d1 >> 4)]; char c2 = B64[((d1 & 0xf) << 2) | (d2 >> 6)]; char c3 = B64[d2 & 0x3f]; /* Missing bytes are then dropped and replaced with '=' for * padding. */ if (len == 1) c2 = c3 = '='; if (len == 2) c3 = '='; printf("%c%c%c%c", c0, c1, c2, c3); written += 4; if (written % WRAP_AT == 0) printf("\n"); buf += 3; len -= (len > 3) ? 3 : len; } if (written > 0 && written % WRAP_AT) printf("\n"); printf("-----END %s-----\n", tag); free(abuf); } /* Generate a keypair using bearssl and stash it in kp. */ void keypair_gen(struct keypair *kp) { br_hmac_drbg_context rng; const br_ec_impl *impl; { /* Set up rng, which is our cryptographically secure random * number generator. */ br_prng_seeder seeder; if (!(seeder = br_prng_seeder_system(NULL))) ouch("no system rng"); br_hmac_drbg_init(&rng, &br_sha256_vtable, NULL, 0); if (!seeder(&rng.vtable)) ouch("can't seed rng"); } /* Now generate the private key and compute the matching public key. */ impl = br_ec_get_default(); br_ec_keygen(&rng.vtable, impl, &kp->s, &kp->sbuf, BR_EC_secp384r1); br_ec_compute_pub(impl, &kp->p, &kp->pbuf, &kp->s); } /* A test helper - serialize a and make sure the encoding matches b of length * len. */ void t_match(struct asn1 *a, const uint8_t *b, size_t len) { uint8_t mbuf[4096]; size_t mlen; mlen = asn1_serialize(a, mbuf, sizeof(mbuf)); if (mlen != len || memcmp(mbuf, b, len)) { printhex("a", mbuf, mlen); printhex("b", b, len); assert(0); } } void selftest() { { struct asn1 *a = asn1_new_int(0xc36d6508); const uint8_t b[] = { 0x02, 0x05, 0x00, 0xc3, 0x6d, 0x65, 0x08 }; t_match(a, b, sizeof(b)); } { struct asn1 *a = asn1_new_int(0); const uint8_t b[] = { 0x02, 0x01, 0x00 }; t_match(a, b, sizeof(b)); } { struct asn1 *a = asn1_new_oid(OID_COMMON_NAME); const uint8_t b[] = { 0x06, 0x03, 0x55, 0x04, 0x03 }; t_match(a, b, sizeof(b)); } { struct asn1 *a = asn1_new_utf8string("Hello!"); const uint8_t b[] = { 0x0c, 0x06, 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x21, }; t_match(a, b, sizeof(b)); } { struct asn1 *a = asn1_new_sequence(); asn1_add_child(a, asn1_new_int(0x42)); asn1_add_child(a, asn1_new_int(0xcc)); asn1_finish(a); const uint8_t b[] = { /* remember that our ints are unsigned but ASN.1 ints * are signed, so 0xcc needs a leading 0 byte */ 0x30, 0x07, 0x02, 0x01, 0x42, 0x02, 0x02, 0x00, 0xcc, }; t_match(a, b, sizeof(b)); } { struct asn1 *a1 = asn1_new_sequence(); asn1_add_child(a1, asn1_new_int(0x42)); asn1_add_child(a1, asn1_new_int(0xcc)); asn1_finish(a1); struct asn1 *a0 = asn1_new_bitstring_wrapping(a1); const uint8_t b[] = { 0x03, 0x0a, 0x00, 0x30, 0x07, 0x02, 0x01, 0x42, 0x02, 0x02, 0x00, 0xcc, }; t_match(a0, b, sizeof(b)); } } /* Allocate a new ASN.1 primitive type, including a buffer for its encoded * value. Primitive types are always considered finished since they can't have * any children and only have an immediate value. */ struct asn1 *_asn1_new_prim(enum asn1type type, ssize_t len) { struct asn1 *a = malloc(sizeof *a); if (!a) ouch("out of memory"); memset(a, 0, sizeof(*a)); a->val.b = malloc(len); if (!a->val.b) ouch("out of memory"); memset(a->val.b, 0, len); a->len = len; a->type = type; a->flags |= ASN1F_FINISHED; return a; } struct asn1 *_asn1_new_comp(enum asn1type type) { struct asn1 *a = malloc(sizeof *a); if (!a) ouch("out of memory"); memset(a, 0, sizeof(*a)); a->type = type; a->len = -1; a->val.l = NULL; return a; } struct asn1 *asn1_new_int(uint64_t val) { /* Convert the value into a big-endian bigint, then treat the whole * thing as a bigint. */ uint8_t bytes[8]; for (size_t i = sizeof(bytes); i; i--) { bytes[i - 1] = val & 0xff; val = val >> 8; } return asn1_new_bigint(bytes, sizeof(bytes)); } struct asn1 *asn1_new_bigint(const uint8_t *v, size_t len) { /* X.690 section 8.3.2 requires that b[0] and the high bit of b[1] are * neither all 0 nor all 1; since v is unsigned we don't want to drop * leading all-1 bytes, but we do want to drop leading all-0 bytes. Do * that here, until we find a nonzero byte or run out of bytes. */ while (!*v && len) { v++; len--; } /* A wrinkle: if the top bit is set, we don't want that to become the * sign bit and make the value negative, so we need to prepend a zero * byte; ditto if there are no bytes in the result, we have to put in a * zero byte so it has an actual value. */ size_t prefix = (!len || v[0] & 0x80) ? 1 : 0; struct asn1 *a = _asn1_new_prim(ASN1_INTEGER, len + prefix); memcpy(a->val.b + prefix, v, len); return a; } size_t _asn1_oid_component(int component, uint8_t *buf, size_t len); struct asn1 *asn1_new_oid(const oid oid) { /* When we're encoding an OID, each component is encoded as a * variable-length integer, where each byte uses its high bit to * indicate whether another byte follows and its low 7 bits to encode 7 * bits of the component. Also, the first two components (which are * always present) are encoded as a single one - if the first two * components are x and y, the first encoded component is x * 40 + y. I * suppose this was done because in practice the values of x are always * in { 0, 1, 2 } and it saves space. * * Unfortunately this encoding is fiddly enough that we have to do it * into a temporary buffer here, so we can figure out how large the real * buffer should be to store it in. Also, the 7-bit variable length * encoding is actually done by _asn1_oid_component() just below. */ uint8_t oidbuf[256]; size_t rlen = sizeof(oidbuf); size_t b = _asn1_oid_component(oid[0] * 40 + oid[1], oidbuf, rlen); for (size_t i = 2; oid[i] != -1; i++) { size_t d = _asn1_oid_component(oid[i], oidbuf + b, rlen - i); b += d; rlen -= d; } struct asn1 *a = _asn1_new_prim(ASN1_OID, b); memcpy(a->val.b, oidbuf, b); return a; } size_t _asn1_oid_component(int comp, uint8_t *buf, size_t len) { uint8_t pb[6]; size_t i = sizeof(pb); while (i > 0) { pb[--i] = comp & 0x7f; if (i != sizeof(pb) - 1) pb[i] |= 0x80; comp = comp >> 7; if (comp <= 0) break; } assert(len >= sizeof(pb) - i); memcpy(buf, pb + i, sizeof(pb) - i); return sizeof(pb) - i; } struct asn1 *asn1_new_bitstring(const uint8_t *bytes, size_t len) { struct asn1 *a = _asn1_new_prim(ASN1_BITSTRING, len + 1); /* X.690 8.6.2.2: the first octet encodes how many bits in the final * byte are unused. Since we don't support bit strings of lengths that * aren't a multiple of 8, always 0. I'm actually unaware of any uses of * BITSTRING in X.509 that are not really bytes. */ a->val.b[0] = 0; memcpy(a->val.b + 1, bytes, len); return a; } struct asn1 *asn1_new_octetstring(const uint8_t *bytes, size_t len) { struct asn1 *a = _asn1_new_prim(ASN1_OCTETSTRING, len); memcpy(a->val.b, bytes, len); return a; } struct asn1 *asn1_new_utf8string(const char *str) { size_t len = strlen(str); struct asn1 *a = _asn1_new_prim(ASN1_UTF8STRING, len); memcpy(a->val.b, str, len); return a; } struct asn1 *asn1_new_utctime(time_t time) { struct tm *tm = gmtime(&time); char buf[32]; size_t len = snprintf(buf, sizeof(buf), "%02d%02d%02d%02d%02d%02dZ", tm->tm_year, tm->tm_mon + 1, tm->tm_mday, tm->tm_hour, tm->tm_min, tm->tm_sec); struct asn1 *a = _asn1_new_prim(ASN1_UTCTIME, len); memcpy(a->val.b, buf, len); return a; } struct asn1 *asn1_new_generalizedtime(time_t time) { struct tm *tm = gmtime(&time); char buf[32]; size_t len = snprintf(buf, sizeof(buf), "%04d%02d%02d%02d%02d%02dZ", tm->tm_year + 1900, tm->tm_mon + 1, tm->tm_mday, tm->tm_hour, tm->tm_min, tm->tm_sec); struct asn1 *a = _asn1_new_prim(ASN1_UTCTIME, len); memcpy(a->val.b, buf, len); return a; } struct asn1 *asn1_new_sequence() { return _asn1_new_comp(ASN1_SEQUENCE); } struct asn1 *asn1_new_set() { return _asn1_new_comp(ASN1_SET); } struct asn1 *asn1_new_bitstring_wrapping(struct asn1 *child) { assert(child->flags & ASN1F_FINISHED); size_t len; uint8_t *buf = asn1_serialize_alloc(child, &len); struct asn1 *a = asn1_new_bitstring(buf, len); free(buf); return a; } struct asn1 *asn1_new_octetstring_wrapping(struct asn1 *child) { assert(child->flags & ASN1F_FINISHED); size_t len = asn1_encoded_length(child); struct asn1 *a = _asn1_new_prim(ASN1_OCTETSTRING, len); asn1_serialize(child, a->val.b, len); return a; } void asn1_add_child(struct asn1 *parent, struct asn1 *child) { struct asn1 **p = &parent->val.l; /* Walk down the child list until we find a next pointer that is NULL, * then scribble over that next pointer with the new node we're trying * to link. This is a C idiom for linking to the tail of a singly-linked * list. */ while (*p) p = &((*p)->next); *p = child; child->next = NULL; } int asn1_is_primitive(const struct asn1 *asn1) { /* ASN.1 uses bit 6 of the type field for indicating whether the type is * composite or not, yay! */ return !(asn1->type & 0x20); } void asn1_free(struct asn1 *asn1) { if (asn1_is_primitive(asn1)) { free(asn1->val.b); } else { struct asn1 *c = asn1->val.l; while (c) { struct asn1 *n = c->next; asn1_free(c); c = n; } } free(asn1); } void asn1_finish(struct asn1 *node) { assert(!(node->flags & ASN1F_FINISHED)); node->flags |= ASN1F_FINISHED; /* Note: in theory, DER requires that the values inside a SET type are * placed in a specific order by lexicographically comparing their * encoded values. We don't do that here because the SETs we use only * ever contain one element. */ size_t clen = 0; for (struct asn1 *c = node->val.l; c; c = c->next) clen += asn1_encoded_length(c); node->len = clen; } /* And that's it for now! Thanks for reading :) * * For your convenience here are all the standards referenced in this post: * * => https://www.rfc-editor.org/rfc/rfc2986.txt cert signature requests * => https://www.rfc-editor.org/rfc/rfc3278.txt ECC in CMS * => https://www.rfc-editor.org/rfc/rfc3279.txt X.509 * => https://www.rfc-editor.org/rfc/rfc5280.txt X.509 again * => https://www.rfc-editor.org/rfc/rfc5480.txt ECC public keys * => https://www.rfc-editor.org/rfc/rfc5915.txt ECC private keys * => https://www.rfc-editor.org/rfc/rfc5958.txt asymmetric key packages * => https://www.rfc-editor.org/rfc/rfc7468.txt PEM * * => https://www.itu.int/rec/T-REC-X.500 ITU X.500: Open Directory * => https://www.itu.int/rec/T-REC-X.509 ITU X.509: X.509 certs (surprise!) * => https://www.itu.int/rec/T-REC-X.520 ITU X.520: Open Directory attributes * => https://www.itu.int/rec/T-REC-X.690 ITU X.690: ASN.1 * * [fn1]: the basic types and stuff are self-describing, but the schema language * is very weird and there are often constraints which are part of the * schema but *not* described on the wire; also, when you use CHOICE or * ANY, there's no discriminant between the various choices, so * sometimes you might just have to guess what's there from context. To * make matters worse, to avoid introducing dependencies between specs, * it is common to smuggle messages around as OCTETSTRINGs. * * [fn2]: in actuality, the ASN.1 tags are varints, and there are some rules * for how bits in the first byte of the varint are used to give them * crude namespaces. Bits 7 and 6 are the "class" bits of the tag (of * which only "universal" gets any use here), bit 5 is the "constructed" * bit (meaning like, composite vs primitive type), and bits 4 through * 0 are part of the tag number. In theory, you could create a LOT of * user-defined types and differentiate them using tags; in practice, * nobody does that, and the fact that the tag is a varint is an * obscure feature. * * [fn3]: OIDs are from an era where it was considered extremely trendy to take * every concept that exists and try to place them into a single * hierarchy, as evidenced by the way X.500 worked as well. OIDs actually * are used, but mostly for items that really are in tightly-defined * hierarchies, and it will not surprise you to hear that the * awe-inspiring power of the system is pretty sparsely applied. * * [fn4]: They actually do often stick some other stuff in there, especially * "OU" (meaning "Organizational Unit", a satisfyingly vague committee * term). For example, the Let's Encrypt CA certificate my website uses * has an OU of "Let's Encrypt", a C (country) of "US", and a common name * of "R3" (?). * * [fn5]: The situation with ASN.1 and strings could fill an entire blog post. * The standard is old enough, and hairy enough, that when it was * drafted it was not yet obvious that UTF-8 was the correct way to do * strings, so there are many, many, many types of strings in ASN.1 with * different arcane rules. * * [fn6]: In fact you send them a CSR, or "Certificate Signature Request", which * is yet another ASN.1 message defined by yet another RFC: RFC 2986, * section 4.1. You might've seen these as ".csr" files. */