/* X.509 Certificates
 * This blog post is a C program - you can compile it with:
 * 	cc -o post post.c -lbearssl
 * It requires bearssl to compile and run.
 *
 * If you've ever set up a web server with https support, you've probably at
 * some point been called upon to interact with cryptic files with names ending
 * in .pem, which contain blocks of base64ed text. I've seen these a bunch of
 * times myself and always vaguely known that there's like, a public key and a
 * signature in there, but not really what's going on in any detail. I decided
 * to find out, by setting out to write a program that would generate a new
 * certificate good enough to be understood by OpenSSL. This is that program.
 *
 * We're gonna use bearssl for the actual crypto stuff (of which there won't be
 * all that much) because the details of how to generate elliptic curve keys and
 * such aren't germane to what we're doing. I did also rampantly ignore errors
 * and other failure modes, so maybe don't use this in production.
 *
 * Alright, let's get right into the C program part with some header files!
 */

#include <assert.h>
#include <bearssl.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

/* It wouldn't be a real blog post without some seriously budget error handling,
 * so here we are: */
void ouch(const char *msg) {
	fprintf(stderr, "%s\n", msg);
	exit(1);
}

/* And a low-rent debugging aid, too: */
void printhex(const char *pre, const uint8_t *bytes, size_t len) {
	const size_t LINE = 32;
	for (size_t i = 0; i < len; i++) {
		if (!(i % LINE)) {
			if (i)
				fprintf(stderr, "\n");
			fprintf(stderr, "%s:", pre);
		}
		fprintf(stderr, " %02x", bytes[i]);
	}
	fprintf(stderr, "\n");
}

/* Alright, on to the main stuff :)
 *
 * X.509 is a standard that defines a particular widely-used digital certificate
 * format, used for TLS (so HTTPS), S/MIME (so one type of encrypted email), and
 * a lot of other uses.
 *
 * X.509 heavily uses a data encoding called ASN.1 ("Abstract Syntax Notation
 * 1"), which is a way of serializing and deserializing messages, like a very
 * old and complicated predecessor to JSON or Protocol Buffers. In X.509,
 * certificates, cryptographic keys, and a bunch of other things are defined as
 * ASN.1 messages. ASN.1 has a bunch of possible encodings; the one that X.509
 * uses is called "DER" (Distinguished Encoding Representation), which is
 * deliberately as unambiguous as possible, meaning messages can generally only
 * be encoded in a single way. This is intended to make it easy to check
 * signatures on messages, since a given message should always serialize to the
 * same bytes.
 *
 * Since blobs of binary data are a bit unwieldy, especially for sending over
 * email, there's a commonly-used standard called PEM ("Privacy-Enhanced Mail")
 * which defines a way to encode... well, anything, but most often ASN.1
 * messages for transport over email. To do that, they base64 the data payload
 * and wrap it in a header/footer to make it easy to extract from a larger
 * message.
 *
 * In fact, let's look at an example PEM file. If you were to generate a new
 * certificate and export it as a PEM file it might look like this:
 *
 * -----BEGIN CERTIFICATE-----
 * MIIBWzCB4gIUEjf0/5Z1iEtGwdVVtpwEVqnItQ4wCgYIKoZIzj0EAwIwEjEQMA4G
 * A1UEAwwHdGVzdGluZzAeFw0yNDAzMDIyMzEzMjVaFw0yNDA0MDEyMzEzMjVaMBIx
 * EDAOBgNVBAMMB3Rlc3RpbmcwdjAQBgcqhkjOPQIBBgUrgQQAIgNiAASdNB7Jt9ni
 * 0NSruQhTGoNyQVjuVLvQAGKITVuUwDd4XwzXToE1FZc6RgAo4tMX5uMh2VIQwxNo
 * BJ5qc7bXwJFeam0j9MKHjDmPhmhWg+3HKO3WK4Ihr467A6DvESybjpYwCgYIKoZI
 * zj0EAwIDaAAwZQIwLw3igZeDUJz0FdnrClommvBzBJ7tn8JkzAl9TrO007vg3fVe
 * 2IcIHA7qsxUqEwcvAjEAw21lCAboqosIwfJnGY3T/byfp97HzXcl4+JCUHyiQ6mD
 * Fzajy+EWuCSWeDpxznk0
 * -----END CERTIFICATE-----
 *
 * This was generated by:
 *   openssl genpkey -algorithm EC -out key.pem \
 *       -pkeyopt ec_paramgen_curve:P-384 \
 *       -pkeyopt ec_param_enc:named_curve
 *   openssl x509 -new -subj /CN=testing/ -key key.pem -out cert.pem
 *
 * And the corresponding key.pem looks like this:
 * -----BEGIN PRIVATE KEY-----
 * MIG2AgEAMBAGByqGSM49AgEGBSuBBAAiBIGeMIGbAgEBBDAOs2FsRLcx6W8yUktn
 * gPmo1X+IkJ/lGlxh4wVLzrn45TTy3YrzDpCc/51zx4VC/GihZANiAASdNB7Jt9ni
 * 0NSruQhTGoNyQVjuVLvQAGKITVuUwDd4XwzXToE1FZc6RgAo4tMX5uMh2VIQwxNo
 * BJ5qc7bXwJFeam0j9MKHjDmPhmhWg+3HKO3WK4Ihr467A6DvESybjpY=
 * -----END PRIVATE KEY-----
 *
 * If we were to un-base64 the lines in between the BEGIN and ENDs there, we'd
 * indeed see ASN.1 messages, although we wouldn't (yet!) know how to interpret
 * them.
 *
 * Let's start with an explanation of (part of) ASN.1, since it's so central to
 * what we're doing here.
 */

/* Regrettably the standard for ASN.1 is produced by the ITU (International
 * Telecommunication Union) as X.690, and is one of the most opaque standards
 * documents I've ever read:
 *
 * => https://www.itu.int/rec/T-REC-X.690-202102-I/en
 *
 * ASN.1 is extremely complicated and has lots of variants and encodings and
 * stuff, but we only have to care about DER, which attempts to provide a
 * single fairly compact encoding for any given message. At its core, ASN.1 in
 * DER and ignoring all of the bizarre options is a self-describing[fn1]
 * type-length-value scheme that looks like this:
 *   tag: 1 byte [fn2]
 *   length: variable
 *   payload: bytes
 * where the length is either in "short form" (up to 0x7f, with the high bit
 * clear) or "long form" (high bit set, low 7 bits encode how many bytes are
 * used for the length (!)).
 *
 * If you're wondering if it's annoying that the length is of variable size: it
 * is. Until you've encoded the payload (which usually involves encoding nested
 * messages) you don't know how long the length is going to need to be, so
 * you're forced to either do a first pass where you size everything then a
 * second pass where you actually encode, or to guess how big the size fields
 * are and then adjust if necessary. The code below does the first sizing pass,
 * but it's very annoying because you have to buffer the whole message in
 * memory.
 *
 * There's one other important and fundmental concept we need to know about:
 * OIDs, or "object identifiers". Lots of things in both ASN.1 and X.509 are
 * identified by "object IDs", which are basically a hierarchical tree[fn3] of
 * namespaces indexed by integers. An OID is usually written as a dotted
 * sequence of unsigned integers, like 1.2.36.1 or whatnot, which means:
 *
 *   1 (= ISO)
 * . 2 (= member body)
 * . 36 (= australia)
 * . 1 (= government)
 *
 * so 1.2.36.1 is the object identifier for the Australian government, or
 * 1.2.840.10045.2.1 which identifies elliptic-curve public keys via:
 *
 *   1 (ISO)
 * . 2 (member body)
 * . 840 (US)
 * . 10045 (ANSI X9.62, the ECDSA standards body)
 * . 2 (keyType)
 * . 1 (ecPublicKey).
 *
 * There are OIDs for types, for cryptographic algorithms, for standards bodies,
 * for places, for... you get the idea. There are a lot of them.
 *
 * Here, we use an array of 16 ints, and terminate the OID with a -1, since the
 * components of OIDs are always nonnegative. The components are allowed to be
 * arbitrarily big, and the OIDs arbitrarily long, but this works for our use
 * cases. */

typedef int oid[16];

/* And here are the OIDs we'll be needing here, stored as described above. These
 * values come from:
 *   1.2.840.10045.2.1: called "id-ecPublicKey" in RFC 3278, section 8.1
 *   1.3.132.0.34: RFC 5480 2.1.1.1, set of named curves
 *   1.2.840.10045.4.3.2: RFC 5480 2.1.2; this document is also authoritative
 *                        for what the various parameters are you need to use
 *                        for crypto types in X.509!
 *   2.5.4.3: from ITU-T X.520
 */

const oid OID_ECPUBKEY = { 1, 2, 840, 10045, 2, 1, -1 };
const oid OID_SECP384R1 = { 1, 3, 132, 0, 34, -1 };
const oid OID_ECDSA_SHA256 = { 1, 2, 840, 10045, 4, 3, 2, -1 };
const oid OID_COMMON_NAME = { 2, 5, 4, 3, -1 };

/* Here are the ASN.1 values types we support encoding, although there are a lot
 * of other ones (many types of strings[fn5], reals, other composite types,
 * etc): */

enum asn1type {
	ASN1_INTEGER         = 0x02,
	ASN1_BITSTRING       = 0x03,
	ASN1_OCTETSTRING     = 0x04,
	ASN1_OID             = 0x06,
	ASN1_UTF8STRING      = 0x0c,

	ASN1_SEQUENCE        = 0x30,

	/* For reasons that are absolutely obscure to me, X.690, the official
	 * standard for ASN.1, does not actually say what the tag values are for
	 * these types. It is completely baffling - the documentation about how
	 * to encode them is simply silent on the matter. I figured these out by
	 * looking at certs generated by OpenSSL. */
	ASN1_UTCTIME         = 0x17,
	ASN1_GENERALIZEDTIME = 0x18,
	ASN1_SET             = 0x31,
};

struct asn1 {
	enum asn1type type;

	/* The length is always present for some types; for others it needs to
	 * be computed depending on the value. Lack of a computed length is
	 * marked a negative length. */
	ssize_t len;
	union {
		/* This value is a primitive type, in which case it has a
		 * definite length and this field points to a buffer containing
		 * its encoded form. */
		uint8_t *b;

		/* If this value is a composite type (a SEQUENCE or SET or
		 * whatever), this points to its first child value; subsequent
		 * children are linked through the next field on the children.
		 */
		struct asn1 *l;
	} val;

	/* If this value is part of a list, this field is used to link to the
	 * next element in that list. */
	struct asn1 *next;

	/* Flags for internal use of the ASN.1 encoder. */
	unsigned int flags;
};

enum {
	/* A 'struct asn1' is finished if its value is complete and its length
	 * is known. Primitive types are always finished, composite types have
	 * to be explicitly finished via asn1_finish() to indicate that no more
	 * children will be added. */
	ASN1F_FINISHED = 0x01,
};

/* And here are our primitives for managing them. We're going to rampantly use
 * heap allocations here to keep the code as simple as possible, but in a real
 * X.509 library you probably wouldn't want to do this.
 *
 * The implementations of those functions are down below, since they're not
 * interesting - they just make a new asn1 object and fill in some of its
 * fields. */
struct asn1 *asn1_new_int(uint64_t val);
struct asn1 *asn1_new_bigint(const uint8_t *bytes, size_t len);
struct asn1 *asn1_new_oid(const oid oid);
struct asn1 *asn1_new_bitstring(const uint8_t *bytes, size_t len);
struct asn1 *asn1_new_octetstring(const uint8_t *bytes, size_t len);
struct asn1 *asn1_new_utf8string(const char *str);
struct asn1 *asn1_new_utctime(time_t time);
struct asn1 *asn1_new_generalizedtime(time_t time);
struct asn1 *asn1_new_sequence();
struct asn1 *asn1_new_set();

/* This is a nasty hack - some fields in X.509 certs are actually themselves
 * ASN.1 values that are encoded, then have their encoded form wrapped in a
 * bitstring. When we do this, we serialize the child immediately rather than
 * delaying serializing it! */
struct asn1 *asn1_new_bitstring_wrapping(struct asn1 *child);
struct asn1 *asn1_new_octetstring_wrapping(struct asn1 *child);

void asn1_add_child(struct asn1 *parent, struct asn1 *child);
int asn1_is_primitive(const struct asn1 *asn1);

void asn1_free(struct asn1 *a);

/* When we're done adding children to a node, we need to finish it - this also
 * computes the node's length. */
void asn1_finish(struct asn1 *node);

size_t asn1_serialize(const struct asn1 *root, uint8_t *buf, size_t len);
uint8_t *asn_serialize_alloc(const struct asn1 *root, size_t *len);

/* And we'll need one helper function, which will tell us how many bytes the
 * entire encoded form of an asn1 value will take, including its tag and
 * (variable length) length: */

size_t asn1_encoded_length(const struct asn1 *root);

/* And last (but definitely not least), we're going to want the ability to emit
 * an ASN.1 message as a PEM file, so that we can actually emit the generated
 * certificate and key. This function will do the base64 encoding and wrap the
 * block of base64 text in PEM-style BEGIN/END lines. The exact format is
 * defined in RFC 7468, although it's quite simple:
 *
 * => https://www.rfc-editor.org/rfc/rfc7468.txt
 */
void asn1_emit_pem(const struct asn1 *asn1, const char *tag);

/* Alright, let's get to implementing the more complicated ASN.1 stuff -
 * remember that the various asn1_new_*() functions and asn1_add_child() are
 * implemented down below. */

/* This function takes an ASN.1 message and encodes it into a given buffer,
 * returning how long the message is. Passing a NULL buffer causes it to return
 * the size without actually serializing anything. */
size_t asn1_serialize_len(const struct asn1 *root, uint8_t *buf, size_t len) {
	/* The ASN.1 length encoding is described in X.690 8.1.3.5. There are
	 * two forms: the short form which uses a single byte <= 0x7f, and the
	 * long form in which:
	 *   - the high bit of byte 0 is 1
	 *   - the low 7 bits of byte 0 encode the length of the length
	 *   - the remaining n bits encode the length
	 */

	/* Every length is going to require at least one byte to serialize... */
	assert(!buf || len > 0);

	if (root->len < 0x80) {
		if (buf)
			buf[0] = root->len;
		return 1;
	}

	/* Now figure out how many bytes the length will take: */
	size_t lenlen = 0;
	size_t dlen = root->len;
	while (dlen) {
		dlen = dlen >> 8;
		lenlen++;
	}

	if (!buf)
		return 1 + lenlen;
	assert(len >= 1 + lenlen);

	/* Set the high bit of the length to indicate that this is in long form,
	 * and encode the rest of the length's length in the low bits. */
	buf[0] = 0x80 | lenlen;
	dlen = root->len;

	/* And write the bytes of the length out, backwards so that the lowest 8
	 * bits of it are at the end. */
	for (size_t j = lenlen; j > 0; j--) {
		buf[j] = dlen & 0xff;
		dlen = dlen >> 8;
	}
	return 1 + lenlen;
}

/* This serializes a given ASN.1 message into a given buffer, which must be long
 * enough to receive the message. It does that by, if the message is a
 * primitive, simply copying its encoded bytes into the buffer (since primitive
 * messages are stored encoded) or, if it is composite, then by encoding all of
 * its submessages into the buffer. The tag and computed length are always added
 * before the serialized message bodies in either case. */
size_t asn1_serialize(const struct asn1 *root, uint8_t *buf, size_t len) {
	assert(len >= asn1_encoded_length(root));
	assert(root->flags & ASN1F_FINISHED);

	buf[0] = root->type;
	size_t lenlen = asn1_serialize_len(root, buf + 1, len - 1);
	if (asn1_is_primitive(root)) {
		/* Primitive -> just copy the already-encoded body in. */
		memcpy(buf + 1 + lenlen, root->val.b, root->len);
		return 1 + lenlen + root->len;
	} else {
		size_t i = 1 + lenlen;

		/* Composite -> serialize all the submessages, keeping track of
		 * how much buffer space we've used so far. */
		for (struct asn1 *c = root->val.l; c; c = c->next) {
			/* Note: if we walk to the end of the buffer here, we
			 * have to stop serializing! */
			size_t left = len > i ? len - i : len;
			i += asn1_serialize(c, buf + i, left);
		}
		return i;
	}
}

/* This is a wrapper function which determines how long a message would be when
 * serialized, allocates a buffer of that size, serializes into it, then returns
 * the allocated buffer. */
uint8_t *asn1_serialize_alloc(const struct asn1 *root, size_t *len) {
	size_t l = asn1_encoded_length(root);
	uint8_t *buf = malloc(l);
	if (!buf)
		ouch("out of memory");
	asn1_serialize(root, buf, l);
	*len = l;
	return buf;
}

/* This is also a wrapper function which returns the length a message would have
 * when serialized. */
size_t asn1_encoded_length(const struct asn1 *root) {
	assert((root->flags & ASN1F_FINISHED) && "length of unfinished node");
	assert(root->len >= 0 && "unmarked length");
	return 1 + asn1_serialize_len(root, NULL, 0) + root->len;
}

/* When we generate the actual certificate, we'll need to generate a keypair for
 * it, too, so that it can actually be used for anything. */
struct keypair {
	uint8_t sbuf[BR_EC_KBUF_PRIV_MAX_SIZE];
	uint8_t pbuf[BR_EC_KBUF_PUB_MAX_SIZE];

	/* These are filled in by bearssl as we generate the keys. */
	br_ec_private_key s;
	br_ec_public_key p;
};

/* The only thing we need to do with a keypair is generate it - signatures are
 * handled inline in x509_new_cert() below. */
void keypair_gen(struct keypair *kp);

/* So now in theory all we need to do is:
 * 1. Create a `struct cert` and fill in all the fields;
 * 2. Serialize most of it (all the non-sig parts);
 * 3. Compute what the signature should be;
 * 4. Serialize the rest of it;
 * 5. Emit it wrapped up into a PEM.
 */

/* This function creates a new ASN.1 value mapping to a "common name" of the
 * supplied name. */
struct asn1 *x509_new_commonname(const char *name) {
	/* A bit about names in X.509: because X.509 gets its view of names from
	 * X.500, names are extremely general. X.500 was based on the
	 * (fundamentally misguided) idea that everyone would stick everything
	 * into a single hierarchical namespace, and as a result it has an
	 * extremely powerful idea of how to name things relative to other
	 * things. Nobody uses even a fraction of this power, and instead what
	 * people do[fn4] is generally jam a single field (called "common name")
	 * in with a string name that has some meaning itself.
	 *
	 * However, because the mechanism is very general, the encoding is also
	 * very general. A name is actually:
	 * 	SEQUENCE OF {
	 *		SET OF {
	 *			SEQUENCE {
	 *				OID key;
	 *				ANY value;
	 *			}
	 *		}
	 *	}
	 * where each element of the outer sequence is notionally relative to
	 * the previous level, so this is called a "relative distinguished
	 * name". We just construct a sequence of length 1, containing a set of
	 * size 1, containing a sequence containing OID_COMMON_NAME and the
	 * given name as a string. Woo. */
	struct asn1 *outer = asn1_new_sequence();
	struct asn1 *set = asn1_new_set();

	struct asn1 *inner = asn1_new_sequence();
	asn1_add_child(inner, asn1_new_oid(OID_COMMON_NAME));
	asn1_add_child(inner, asn1_new_utf8string(name));
	asn1_finish(inner);

	asn1_add_child(set, inner);
	asn1_finish(set);

	asn1_add_child(outer, set);
	asn1_finish(outer);

	return outer;
}

/* The X.509 time encoding rules are ridiculous. ASN.1 has two separate time
 * types, UTCTime and GeneralizedTime, where GeneralizedTime is basically
 * UTCTime with 4-digit years instead of 2-digit and with the option for
 * fractional seconds. Obviously, the correct thing to do is simply use
 * GeneralizedTime for all time fields, since UTCTime cannot represent a lot of
 * dates, and in a just universe, that is what would happen.
 *
 * Unfortunately, RFC 5280 4.1.2.5 requires the following unhinged behavior
 * instead: if the time's year is 2049 or less, the time is represented using
 * UTCTime, with the rule that if the year field in that UTCTime is >= 50 it is
 * interpreted as being 19YY, and otherwise it is interpreted as being 20YY, so
 * a year field of 87 means 1987 and a year field of 37 means 2037. If the
 * time's year is 2050 or more, instead one simply uses GeneralizedTime. This
 * insane hack with the UTCTime year field is local to X.509, too, so we can't
 * properly ask the ASN.1 serializer to sort it out for us.
 *
 * Oh, and to add insult to injury, these are represented on the wire as
 * strings. */

const unsigned long HOURS = 60 * 60;
const unsigned long DAYS = 24 * HOURS;
const unsigned long YEARS = 365 * DAYS;

struct asn1 *x509_new_time(time_t time) {
	struct tm *tm = gmtime(&time);

	if (tm->tm_year >= 150)
		/* Thank goodness! */
		return asn1_new_generalizedtime(time);

	if (tm->tm_year > 100)
		/* 2037 gets mapped to 1937, which then gets represented by
		 * UTCTime as "37". Note that this will probably mess up the
		 * exact day/time since years aren't all 100 * YEARS long, so,
		 * you know, caveat emptor. */
		return asn1_new_utctime(time - 100 * YEARS);
	else
		/* 1987 is just treated as 1987 and UTCTime represents it as
		 * "87". */
		return asn1_new_utctime(time);
}

struct asn1 *x509_new_algid() {
	/* An "algorithm" is identified by (not joking) an "algorithm
	 * identifier", which is an OID, and also a set of parameters,
	 * which are an ANY in the ASN.1 definition of this type. What
	 * that ANY actually contains depends on the algorithm
	 * identifier.
	 *
	 * For our particular case, we'll only support using elliptic
	 * curve keys (OID 1.2.840.10045.2.1). We'll have to skip over
	 * to RFC 3279 to find out what to put in the ANY:
	 *
	 * => https://www.rfc-editor.org/rfc/rfc3279.txt
	 *
	 * ... which says that the parameters are one of: an explicit
	 * set of EC parameters, a specific "named curve", or
	 * "implicitlyCA", which means that the parameters field is
	 * null and the EC parameters are inherited from the CA. As
 	 * far as I know everybody uses named curves for this. A named
	 * curve is itself referenced by an OID, so we store that here
	 * too. */

	struct asn1 *algid = asn1_new_sequence();
	asn1_add_child(algid, asn1_new_oid(OID_ECPUBKEY));
	asn1_add_child(algid, asn1_new_oid(OID_SECP384R1));
	asn1_finish(algid);
	return algid;
}

/* Now we need to talk about what's actually inside those certificates. For that
 * we have to turn to RFC 5280, "Internet X.509 Public Key Infrastructure
 * Certificate and Certificate Revocation List Profile".
 *
 * => https://www.rfc-editor.org/rfc/rfc5280.txt
 *
 * and head down to section 4.1, "basic certificate fields", where we will find
 * two structures: Certificate and TBSCertificate. TBSCertificate contains all
 * the "certificate stuff", and then is wrapped up and signed to form the
 * Certificate, so the general flow is that:
 * 1. You make a TBSCertificate
 * 2. You send it[fn6] to a CA who checks whatever (your identity, your ownership
 *    of the domain, whether your credit card payment has cleared, etc)
 * 3. The CA signs the TBSCertificate, producing a Certificate, and sends it
 *    back to you
 * 4. You put the Certificate into your web server
 */

struct asn1 *x509_new_tbscert(const struct keypair *kp, const char *subject) {
	struct asn1 *tbs = asn1_new_sequence();

	/* There is a version field right at the start, but because we're
	 * leaving it at the default value (0, meaning X.509 v1), we omit it. */

	/* cert serial number */
	asn1_add_child(tbs, asn1_new_int(0));

	/* the "signature" field, which is actually the algorithm id for the
	 * signature on this certificate, so it has to match the one on the
	 * outer certificate structure. In this case, it's represented as a
	 * sequence with only one OID in it, although theoretically it could
	 * have more parameters or whatever. */
	{
		struct asn1 *alg = asn1_new_sequence();
		asn1_add_child(alg, asn1_new_oid(OID_ECDSA_SHA256));
		asn1_finish(alg);

		asn1_add_child(tbs, alg);
	}

	/* TODO: support issuer != subject */
	asn1_add_child(tbs, x509_new_commonname(subject));

	{
		time_t now = time(NULL);
		time_t notbefore = now - 30 * DAYS;
		time_t notafter = now + 10 * YEARS;

		struct asn1 *validity = asn1_new_sequence();
		asn1_add_child(validity, x509_new_time(notbefore));
		asn1_add_child(validity, x509_new_time(notafter));
		asn1_finish(validity);

		asn1_add_child(tbs, validity);
	}

	asn1_add_child(tbs, x509_new_commonname(subject));

	{
		struct asn1 *spki = asn1_new_sequence();

		asn1_add_child(spki, x509_new_algid());
		asn1_add_child(spki, asn1_new_bitstring(kp->p.q, kp->p.qlen));
		asn1_finish(spki);

		asn1_add_child(tbs, spki);
	}

	/* If we were generating an X.509v3 certificate (we're not), we could
	 * include a list of extensions here, which are a sequence of one or
	 * more of:
	 *
	 * 	SEQUENCE {
	 * 		OID id;
	 *		BOOL critical;
	 *		OCTETSTRING value;
	 *	}
	 *
	 * where the values are themselves DER-encoded ASN.1 messages whose form
	 * depends on the extension id, and the critical flag means "you must
	 * understand this extension to use this certificate". This mechanism is
	 * used for a LOT of stuff, most notably including subject alt names
	 * (used to allow the same cert for multiple hostnames, very common in
	 * HTTPS setups) and "basicConstraints", which are what actually mark
	 * certs as *not* being CA certs (!) in the HTTPS PKI. Yikes.
	 */

	asn1_finish(tbs);
	return tbs;
}

struct asn1 *cms_new_privkey(const struct keypair *kp) {
	/* RFC 5958, "Asymmetric Key Packages", specifies what it calls the
	 * "Cryptographic Message Syntax" or CMS. That document specifies that
	 * you can put a single PrivateKeyInfo between PEM "PRIVATE KEY" tags,
	 * and that a PrivateKeyInfo is:
	 * 	SEQUENCE {
	 *		INT version;
	 *		AlgorithmId algorithm;
	 *		OCTETSTRING privatekey;
	 *		... and a couple of optional things we don't care about.
	 *	}
	 * so the encoding looks pretty straightforward here, but you can go read
	 * RFC 5958 if you want to learn all the gory details:
	 *
	 * => https://www.rfc-editor.org/rfc/rfc5958.txt
	 *
	 * However... what is the privatekey OCTETSTRING supposed to actually
	 * be? RFC 5958 says "the algorithm identifier dictates the format of
	 * the key" with no further elaboration.
	 *
	 * To find the actual answer, we have to go look at RFC 5915, "Elliptic
	 * Curve Private Key Structure":
	 *
	 * => https://www.rfc-editor.org/rfc/rfc5915.txt
	 *
	 * Which says that the OCTETSTRING is meant to contain:
	 *
	 * SEQUENCE {
	 * 	INTEGER version (1)
	 *	OCTETSTRING privateKey
	 *	ECParameters parameters OPTIONAL
	 *	BITSTRING publicKey OPTIONAL
	 * }
	 *
	 * so all we need to emit is the version and the actual private key - if
	 * we leave the public key out, it can be re-derived from the private
	 * key anyway. Let's do it:
	 */

	struct asn1 *priv = asn1_new_sequence();
	asn1_add_child(priv, asn1_new_int(0));
	asn1_add_child(priv, x509_new_algid());

	{
		struct asn1 *pki = asn1_new_sequence();
		asn1_add_child(pki, asn1_new_int(1));
		asn1_add_child(pki, asn1_new_octetstring(kp->s.x, kp->s.xlen));
		asn1_finish(pki);

		asn1_add_child(priv, asn1_new_octetstring_wrapping(pki));
	}

	asn1_finish(priv);

	return priv;
}

void sha256(const uint8_t *in, size_t inlen, uint8_t *out) {
	br_sha256_context ctx;
	br_sha256_init(&ctx);
	br_sha256_update(&ctx, in, inlen);
	br_sha256_out(&ctx, out);
}

/* Time to generate an actual signature! */
struct asn1 *x509_new_signature(struct asn1 *tosign, const struct keypair *kp) {
	const br_ec_impl *impl;
	br_ecdsa_sign signer;
	uint8_t hash[br_sha256_SIZE];
	uint8_t sigbuf[512];
	size_t siglen;

	/* Serialize the whole tosign message into a buffer, then sha256 that
	 * buffer, then discard the buffer - we only need the hash for the rest
	 * of this. */
	{
		size_t len;
		uint8_t *buf = asn1_serialize_alloc(tosign, &len);
		sha256(buf, len, hash);
		free(buf);
	}

	impl = br_ec_get_default();
	signer = br_ecdsa_sign_raw_get_default();
	siglen = signer(impl, &br_sha256_vtable, hash, &kp->s, &sigbuf);

	/* An ECDSA signature is a pair of integers, called r and s. We get them
	 * back from bearssl packed into a single buffer, padded with leading
	 * zero bytes so that they're both the same length. The actual signature
	 * format we need here is the ASN.1 signature format from RFC 3279
	 * section 2.2.3, which looks like:
	 *   SEQUENCE { r INTEGER, s INTEGER };
	 * so we'll now need to pull both of those values out of the raw
	 * signature, encode them as ASN.1 integers, then pack them into an
	 * ASN.1 sequence.
	 *
	 * Do note that bearssl can do this for us, via either:
	 *   br_ecdsa_sign_asn1_get_default() (which returns an ASN.1 signature)
	 *   br_ecdsa_raw_to_asn1()
	 * but we'll do it by hand here - I wrote a partial ASN.1 encoder and
	 * I'm dang well going to use it.
	 */
	struct asn1 *sig = asn1_new_sequence();
	asn1_add_child(sig, asn1_new_bigint(sigbuf, siglen / 2));
	asn1_add_child(sig, asn1_new_bigint(sigbuf + siglen / 2, siglen / 2));
	asn1_finish(sig);

	struct asn1 *wrappedsig = asn1_new_bitstring_wrapping(sig);
	asn1_free(sig);
	return wrappedsig;
}

/* The cert itself is super simple: the tbscert, then an algorithm id for what
 * type of signature algorithm we used (ECDSA-with-SHA256, natch), then the
 * signature itself. Easy! */
struct asn1 *x509_new_cert(struct asn1 *tbscert, const struct keypair *kp) {
	struct asn1 *cert = asn1_new_sequence();
	asn1_add_child(cert, tbscert);
	{
		struct asn1 *seq = asn1_new_sequence();
		asn1_add_child(seq, asn1_new_oid(OID_ECDSA_SHA256));
		asn1_finish(seq);
		asn1_add_child(cert, seq);
	}
	asn1_add_child(cert, x509_new_signature(tbscert, kp));
	asn1_finish(cert);

	return cert;
}

void selftest();

int main() {
	struct keypair kp;

	selftest();
	keypair_gen(&kp);

	struct asn1 *tbscert = x509_new_tbscert(&kp, "testing");

	/* In theory, one could pass a different keypair here to sign the cert
	 * with some other key; that would generate a cert signed by a CA. Here,
	 * we're generating a self-signed cert instead. */
	struct asn1 *cert = x509_new_cert(tbscert, &kp);

	/* Generate the private key in Cryptographic Message Syntax: */
	struct asn1 *priv = cms_new_privkey(&kp);

	asn1_emit_pem(cert, "CERTIFICATE");
	asn1_emit_pem(priv, "PRIVATE KEY");

	/* Ta-da! Cert and private key generated, and ready to be slurped up by
	 * any SSL library. */

	return 0;
}

/* This function takes an ASN.1 message and emits it as a PEM message. To do
 * that, it has to serialize the entire thing, then base64 encode its body. */
void asn1_emit_pem(const struct asn1 *asn1, const char *tag) {
	printf("-----BEGIN %s-----\n", tag);

	/* What column to wrap the base64 text at; this must be divisible by 4
	 * or bad things will happen. If this was C11 we'd write:
	 *   enum { WRAP_AT = 64 };
	 *   static_assert(!(WRAP_AT % 4));
	 * but it's not so here we are. */
	const size_t WRAP_AT = 64;

	const char *B64 =
		"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
		"abcdefghijklmnopqrstuvwxyz"
		"0123456789+/";
	size_t written = 0;

	size_t len;
	uint8_t *abuf = asn1_serialize_alloc(asn1, &len);
	uint8_t *buf = abuf;

	while (len > 0) {
		/* Every 3 input bytes turn into 4 base64 characters, and
		 * missing bytes at the end are treated as being 0 for this
		 * encoding. */
		uint8_t d0 = buf[0];
		uint8_t d1 = len > 1 ? buf[1] : 0;
		uint8_t d2 = len > 2 ? buf[2] : 0;

		/* aaaaaa aabbbb bbbbcc cccccc */
		char c0 = B64[(d0 >> 2)];
		char c1 = B64[((d0 & 0x3) << 4) | (d1 >> 4)];
		char c2 = B64[((d1 & 0xf) << 2) | (d2 >> 6)];
		char c3 = B64[d2 & 0x3f];

		/* Missing bytes are then dropped and replaced with '=' for
		 * padding. */
		if (len == 1)
			c2 = c3 = '=';
		if (len == 2)
			c3 = '=';

		printf("%c%c%c%c", c0, c1, c2, c3);
		written += 4;
		if (written % WRAP_AT == 0)
			printf("\n");

		buf += 3;
		len -= (len > 3) ? 3 : len;
	}
	if (written > 0 && written % WRAP_AT)
		printf("\n");
	printf("-----END %s-----\n", tag);
	free(abuf);
}

/* Generate a keypair using bearssl and stash it in kp. */
void keypair_gen(struct keypair *kp) {
	br_hmac_drbg_context rng;
	const br_ec_impl *impl;

	{
		/* Set up rng, which is our cryptographically secure random
		 * number generator. */
		br_prng_seeder seeder;
		if (!(seeder = br_prng_seeder_system(NULL)))
			ouch("no system rng");
		br_hmac_drbg_init(&rng, &br_sha256_vtable, NULL, 0);
		if (!seeder(&rng.vtable))
			ouch("can't seed rng");
	}


	/* Now generate the private key and compute the matching public key. */
	impl = br_ec_get_default();
	br_ec_keygen(&rng.vtable, impl, &kp->s, &kp->sbuf, BR_EC_secp384r1);
	br_ec_compute_pub(impl, &kp->p, &kp->pbuf, &kp->s);
}

/* A test helper - serialize a and make sure the encoding matches b of length
 * len. */
void t_match(struct asn1 *a, const uint8_t *b, size_t len) {
	uint8_t mbuf[4096];
	size_t mlen;

	mlen = asn1_serialize(a, mbuf, sizeof(mbuf));

	if (mlen != len || memcmp(mbuf, b, len)) {
		printhex("a", mbuf, mlen);
		printhex("b", b, len);
		assert(0);
	}
}

void selftest() {
	{
		struct asn1 *a = asn1_new_int(0xc36d6508);
		const uint8_t b[] = { 0x02, 0x05, 0x00, 0xc3, 0x6d, 0x65, 0x08 };
		t_match(a, b, sizeof(b));
	}
	{
		struct asn1 *a = asn1_new_int(0);
		const uint8_t b[] = { 0x02, 0x01, 0x00 };
		t_match(a, b, sizeof(b));
	}
	{
		struct asn1 *a = asn1_new_oid(OID_COMMON_NAME);
		const uint8_t b[] = { 0x06, 0x03, 0x55, 0x04, 0x03 };
		t_match(a, b, sizeof(b));
	}
	{
		struct asn1 *a = asn1_new_utf8string("Hello!");
		const uint8_t b[] = {
			0x0c, 0x06, 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x21,
		};
		t_match(a, b, sizeof(b));
	}
	{
		struct asn1 *a = asn1_new_sequence();
		asn1_add_child(a, asn1_new_int(0x42));
		asn1_add_child(a, asn1_new_int(0xcc));
		asn1_finish(a);
		const uint8_t b[] = {
			/* remember that our ints are unsigned but ASN.1 ints
			 * are signed, so 0xcc needs a leading 0 byte */
			0x30, 0x07, 0x02, 0x01, 0x42, 0x02, 0x02, 0x00, 0xcc,
		};
		t_match(a, b, sizeof(b));
	}
	{
		struct asn1 *a1 = asn1_new_sequence();
		asn1_add_child(a1, asn1_new_int(0x42));
		asn1_add_child(a1, asn1_new_int(0xcc));
		asn1_finish(a1);
		struct asn1 *a0 = asn1_new_bitstring_wrapping(a1);
		const uint8_t b[] = {
			0x03, 0x0a, 0x00,
			0x30, 0x07, 0x02, 0x01, 0x42, 0x02, 0x02, 0x00, 0xcc,
		};
		t_match(a0, b, sizeof(b));
	}
}

/* Allocate a new ASN.1 primitive type, including a buffer for its encoded
 * value. Primitive types are always considered finished since they can't have
 * any children and only have an immediate value. */
struct asn1 *_asn1_new_prim(enum asn1type type, ssize_t len) {
	struct asn1 *a = malloc(sizeof *a);
	if (!a)
		ouch("out of memory");
	memset(a, 0, sizeof(*a));
	a->val.b = malloc(len);
	if (!a->val.b)
		ouch("out of memory");
	memset(a->val.b, 0, len);
	a->len = len;
	a->type = type;
	a->flags |= ASN1F_FINISHED;
	return a;
}

struct asn1 *_asn1_new_comp(enum asn1type type) {
	struct asn1 *a = malloc(sizeof *a);
	if (!a)
		ouch("out of memory");
	memset(a, 0, sizeof(*a));
	a->type = type;
	a->len = -1;
	a->val.l = NULL;
	return a;
}

struct asn1 *asn1_new_int(uint64_t val) {
	/* Convert the value into a big-endian bigint, then treat the whole
	 * thing as a bigint. */
	uint8_t bytes[8];
	for (size_t i = sizeof(bytes); i; i--) {
		bytes[i - 1] = val & 0xff;
		val = val >> 8;
	}
	return asn1_new_bigint(bytes, sizeof(bytes));
}

struct asn1 *asn1_new_bigint(const uint8_t *v, size_t len) {
	/* X.690 section 8.3.2 requires that b[0] and the high bit of b[1] are
 	 * neither all 0 nor all 1; since v is unsigned we don't want to drop
	 * leading all-1 bytes, but we do want to drop leading all-0 bytes. Do
	 * that here, until we find a nonzero byte or run out of bytes. */
	while (!*v && len) {
		v++;
		len--;
	}

	/* A wrinkle: if the top bit is set, we don't want that to become the
	 * sign bit and make the value negative, so we need to prepend a zero
	 * byte; ditto if there are no bytes in the result, we have to put in a
	 * zero byte so it has an actual value. */
	size_t prefix = (!len || v[0] & 0x80) ? 1 : 0;

	struct asn1 *a = _asn1_new_prim(ASN1_INTEGER, len + prefix);
	memcpy(a->val.b + prefix, v, len);

	return a;
}

size_t _asn1_oid_component(int component, uint8_t *buf, size_t len);

struct asn1 *asn1_new_oid(const oid oid) {
	/* When we're encoding an OID, each component is encoded as a
	 * variable-length integer, where each byte uses its high bit to
	 * indicate whether another byte follows and its low 7 bits to encode 7
	 * bits of the component. Also, the first two components (which are
	 * always present) are encoded as a single one - if the first two
	 * components are x and y, the first encoded component is x * 40 + y. I
	 * suppose this was done because in practice the values of x are always
	 * in { 0, 1, 2 } and it saves space.
	 *
	 * Unfortunately this encoding is fiddly enough that we have to do it
	 * into a temporary buffer here, so we can figure out how large the real
	 * buffer should be to store it in. Also, the 7-bit variable length
	 * encoding is actually done by _asn1_oid_component() just below. */
	uint8_t oidbuf[256];
	size_t rlen = sizeof(oidbuf);
	size_t b = _asn1_oid_component(oid[0] * 40 + oid[1], oidbuf, rlen);
	for (size_t i = 2; oid[i] != -1; i++) {
		size_t d = _asn1_oid_component(oid[i], oidbuf + b, rlen - i);
		b += d;
		rlen -= d;
	}

	struct asn1 *a = _asn1_new_prim(ASN1_OID, b);
	memcpy(a->val.b, oidbuf, b);
	return a;
}

size_t _asn1_oid_component(int comp, uint8_t *buf, size_t len) {
	uint8_t pb[6];
	size_t i = sizeof(pb);
	while (i > 0) {
		pb[--i] = comp & 0x7f;
		if (i != sizeof(pb) - 1)
			pb[i] |= 0x80;
		comp = comp >> 7;
		if (comp <= 0)
			break;
	}
	assert(len >= sizeof(pb) - i);
	memcpy(buf, pb + i, sizeof(pb) - i);
	return sizeof(pb) - i;
}

struct asn1 *asn1_new_bitstring(const uint8_t *bytes, size_t len) {
	struct asn1 *a = _asn1_new_prim(ASN1_BITSTRING, len + 1);

	/* X.690 8.6.2.2: the first octet encodes how many bits in the final
	 * byte are unused. Since we don't support bit strings of lengths that
	 * aren't a multiple of 8, always 0. I'm actually unaware of any uses of
	 * BITSTRING in X.509 that are not really bytes. */
	a->val.b[0] = 0;
	memcpy(a->val.b + 1, bytes, len);
	return a;
}

struct asn1 *asn1_new_octetstring(const uint8_t *bytes, size_t len) {
	struct asn1 *a = _asn1_new_prim(ASN1_OCTETSTRING, len);
	memcpy(a->val.b, bytes, len);
	return a;
}

struct asn1 *asn1_new_utf8string(const char *str) {
	size_t len = strlen(str);
	struct asn1 *a = _asn1_new_prim(ASN1_UTF8STRING, len);
	memcpy(a->val.b, str, len);
	return a;
}

struct asn1 *asn1_new_utctime(time_t time) {
	struct tm *tm = gmtime(&time);
	char buf[32];
	size_t len = snprintf(buf, sizeof(buf), "%02d%02d%02d%02d%02d%02dZ",
	                      tm->tm_year, tm->tm_mon + 1, tm->tm_mday,
	                      tm->tm_hour, tm->tm_min, tm->tm_sec);

	struct asn1 *a = _asn1_new_prim(ASN1_UTCTIME, len);
	memcpy(a->val.b, buf, len);
	return a;
}

struct asn1 *asn1_new_generalizedtime(time_t time) {
	struct tm *tm = gmtime(&time);
	char buf[32];
	size_t len = snprintf(buf, sizeof(buf), "%04d%02d%02d%02d%02d%02dZ",
	                      tm->tm_year + 1900, tm->tm_mon + 1, tm->tm_mday,
	                      tm->tm_hour, tm->tm_min, tm->tm_sec);

	struct asn1 *a = _asn1_new_prim(ASN1_UTCTIME, len);
	memcpy(a->val.b, buf, len);
	return a;
}

struct asn1 *asn1_new_sequence() {
	return _asn1_new_comp(ASN1_SEQUENCE);
}

struct asn1 *asn1_new_set() {
	return _asn1_new_comp(ASN1_SET);
}

struct asn1 *asn1_new_bitstring_wrapping(struct asn1 *child) {
	assert(child->flags & ASN1F_FINISHED);

	size_t len;
	uint8_t *buf = asn1_serialize_alloc(child, &len);

	struct asn1 *a = asn1_new_bitstring(buf, len);
	free(buf);

	return a;
}

struct asn1 *asn1_new_octetstring_wrapping(struct asn1 *child) {
	assert(child->flags & ASN1F_FINISHED);

	size_t len = asn1_encoded_length(child);
	struct asn1 *a = _asn1_new_prim(ASN1_OCTETSTRING, len);
	asn1_serialize(child, a->val.b, len);
	return a;
}

void asn1_add_child(struct asn1 *parent, struct asn1 *child) {
	struct asn1 **p = &parent->val.l;
	/* Walk down the child list until we find a next pointer that is NULL,
	 * then scribble over that next pointer with the new node we're trying
	 * to link. This is a C idiom for linking to the tail of a singly-linked
	 * list. */
	while (*p)
		p = &((*p)->next);
	*p = child;
	child->next = NULL;
}

int asn1_is_primitive(const struct asn1 *asn1) {
	/* ASN.1 uses bit 6 of the type field for indicating whether the type is
 	* composite or not, yay! */
	return !(asn1->type & 0x20);
}

void asn1_free(struct asn1 *asn1) {
	if (asn1_is_primitive(asn1)) {
		free(asn1->val.b);
	} else {
		struct asn1 *c = asn1->val.l;
		while (c) {
			struct asn1 *n = c->next;
			asn1_free(c);
			c = n;
		}
	}
	free(asn1);
}

void asn1_finish(struct asn1 *node) {
	assert(!(node->flags & ASN1F_FINISHED));
	node->flags |= ASN1F_FINISHED;

	/* Note: in theory, DER requires that the values inside a SET type are
	 * placed in a specific order by lexicographically comparing their
	 * encoded values. We don't do that here because the SETs we use only
	 * ever contain one element. */

	size_t clen = 0;
	for (struct asn1 *c = node->val.l; c; c = c->next)
		clen += asn1_encoded_length(c);
	node->len = clen;
}

/* And that's it for now! Thanks for reading :)
 *
 * For your convenience here are all the standards referenced in this post:
 *
 * => https://www.rfc-editor.org/rfc/rfc2986.txt cert signature requests
 * => https://www.rfc-editor.org/rfc/rfc3278.txt ECC in CMS
 * => https://www.rfc-editor.org/rfc/rfc3279.txt X.509
 * => https://www.rfc-editor.org/rfc/rfc5280.txt X.509 again
 * => https://www.rfc-editor.org/rfc/rfc5480.txt ECC public keys
 * => https://www.rfc-editor.org/rfc/rfc5915.txt ECC private keys
 * => https://www.rfc-editor.org/rfc/rfc5958.txt asymmetric key packages
 * => https://www.rfc-editor.org/rfc/rfc7468.txt PEM
 *
 * => https://www.itu.int/rec/T-REC-X.500 ITU X.500: Open Directory
 * => https://www.itu.int/rec/T-REC-X.509 ITU X.509: X.509 certs (surprise!)
 * => https://www.itu.int/rec/T-REC-X.520 ITU X.520: Open Directory attributes
 * => https://www.itu.int/rec/T-REC-X.690 ITU X.690: ASN.1
 *
 * [fn1]: the basic types and stuff are self-describing, but the schema language
 *        is very weird and there are often constraints which are part of the
 *        schema but *not* described on the wire; also, when you use CHOICE or
 *        ANY, there's no discriminant between the various choices, so
 *        sometimes you might just have to guess what's there from context. To
 *        make matters worse, to avoid introducing dependencies between specs,
 *        it is common to smuggle messages around as OCTETSTRINGs.
 *
 * [fn2]: in actuality, the ASN.1 tags are varints, and there are some rules
 *        for how bits in the first byte of the varint are used to give them
 *        crude namespaces. Bits 7 and 6 are the "class" bits of the tag (of
 *        which only "universal" gets any use here), bit 5 is the "constructed"
 *        bit (meaning like, composite vs primitive type), and bits 4 through
 *        0 are part of the tag number. In theory, you could create a LOT of
 *        user-defined types and differentiate them using tags; in practice,
 *        nobody does that, and the fact that the tag is a varint is an
 *        obscure feature.
 *
 * [fn3]: OIDs are from an era where it was considered extremely trendy to take
 *        every concept that exists and try to place them into a single
 *        hierarchy, as evidenced by the way X.500 worked as well. OIDs actually
 *        are used, but mostly for items that really are in tightly-defined
 *        hierarchies, and it will not surprise you to hear that the
 *        awe-inspiring power of the system is pretty sparsely applied.
 *
 * [fn4]: They actually do often stick some other stuff in there, especially
 *        "OU" (meaning "Organizational Unit", a satisfyingly vague committee
 *        term). For example, the Let's Encrypt CA certificate my website uses
 *        has an OU of "Let's Encrypt", a C (country) of "US", and a common name
 *        of "R3" (?).
 *
 * [fn5]: The situation with ASN.1 and strings could fill an entire blog post.
 *        The standard is old enough, and hairy enough, that when it was
 *        drafted it was not yet obvious that UTF-8 was the correct way to do
 *        strings, so there are many, many, many types of strings in ASN.1 with
 *        different arcane rules.
 *
 * [fn6]: In fact you send them a CSR, or "Certificate Signature Request", which
 *        is yet another ASN.1 message defined by yet another RFC: RFC 2986,
 *        section 4.1. You might've seen these as ".csr" files.
 */