Advent of Code: Libraries

I recently finished getting all 500 stars in Advent of Code, working mostly in C. One of the obstacles to solving these problems in C is that unlike most language standard libraries libc supplies only a small set of data structures and operations on them. All you have in ANSI C are:

One sorting algorithm: qsort(3)
Two searches: linear search (lsearch(3)) and binary search (bsearch(3))
A single, global hash table via hsearch(3), with a pretty nasty interface that requires keys to be null-terminated strings
Binary trees via tsearch(3), but with a very limiting interface that doesn't allow you to pass any context into the callback functions

This isn't enough to do most tasks and it's certainly not enough to do Advent of Code, so I needed to build some data structures of my own. In particular, I needed:

A queue, so I can implement breadth-first search
A real hash table, ideally keyed by types that aren't just strings; this gets used for everything else so efficiency is important
A point map type (a map from 3d coordinate triples to values) - this isn't just a hash table for reasons I'll explain below
A graph type, with values stored in the nodes

Most of these are straightforward for any programmer to implement, so I don't really feel like writing about the implementations in any details. Instead, I want to write about a problem I faced building generic versions of these types, and what I did about it.

These types are all conceptually generic: a queue (for example) is actually a queue of values of some other type T, so you can say that queue takes a "type parameter" called T and is of type queue<T>. A hash table is a type that takes two parameters K and V (the key and value type respectively) and is of type map<K, V>.

So far so simple, but C doesn't have parameterized types like this, so we'll need to figure out how to actually implement it. For a normal non-generic type like this:

        struct foo {
          int a;
          int b;
          int c;
        };

        int b_plus_c(const struct foo *f) {
                return f->b + f->c;
        }

The size of struct foo is known at compile-time, and so is the code that needs to be generated to access, say, foo.b. However, if we instead gave it a hypothetical type parameter:

        struct foo<T> {
          T a;
          int b;
          int c;
        }

        int b_plus_c(const struct foo<T> *f) {
                return f->b + f->c;
        }

then the size of struct foo<T> actually depends on the size of T, and worse, the code the compiler needs to generate for b_plus_c depends on what T is, because b and c are at different offsets in struct foo<T> depending on the size of T. Even simple operations like putting a value of type struct foo<T> on the stack depend on the value of T.

There are two approaches language designers can take to solve this problem: "boxing" and "monomorphization", which sounds fancy but is actually very simple. "Boxing" works as follows: the crux of the problem is that different types are of different sizes, so what if we always put the type in a "box" that has the same size regardless of the size of the underlying type? When done in C, that might look like this:

        struct foo<T> {
          T* a;
          int b;
          int c;
        }

        int b_plus_c(const struct foo<T> *f) {
                return f->b + f->c;
        }

The reason this helps is that regardless of the size of T, the size of T* is always exactly the size of one pointer. In fact, in C it's very common to see this pattern used for generic data structures:

        struct foo {
          void* a;
          int b;
          int c;
        }

where the a field can point to anything. Of course, there's no way for the compiler to detect errors such as foo->a pointing to a value of type T somewhere but being used as a value of type U later, but for a C programmer that is normal. In many languages the "box" type is not just a pointer, but actually a pair of a pointer and a type tag, so that a box can't be misused as a value of the wrong type; we could do that ourselves in C too, but we'd have to somehow allocate tags to types.

There is one very, very big downside to boxing: it adds a layer of indirection. Whenever we want to operate on foo->a, we have to access it through a pointer, which has negative effects on cache locality and other performance characteristics. This hit can be big, especially in tight inner loops, so languages that focus heavily on performance don't usually default to boxing for type genericity.

There is another way though: monomorphization, which is what C++ and most other "systems" languages do for generics. The key idea of monomorphization is that we can remove type parameters from types and functions by making specialized copies of them for each type parameter they are used with. This is called "monomorphization" by analogy with "polymorphism" - struct foo<T> is polymorphic, while struct foo is not. To use our earlier example:

        struct foo<T> {
          T a;
          int b;
          int c;
        }

        int b_plus_c(const struct foo<T> *f) {
                return f->b + f->c;
        }

Suppose that this type is used (anywhere in our program) with two different type parameters - let's make them char and int for example. We could monomorphize this type and function like so:

        struct foo_with_char {
          char a;
          int b;
          int c;
        }

        int b_plus_c_with_char(const struct foo_with_char *f) {
                return f->b + f->c;
        }

        struct foo_with_int {
          int a;
          int b;
          int c;
        }

        int b_plus_c_with_int(const struct foo_with_int *f) {
                return f->b + f->c;
        }

Of course writing this all out by hand for a lot of types quickly induces a headache, but if the compiler generates this for us during compilation it's basically fine - we end up with one copy of the machine code for each type parameter a type or function is used with, which hurts code size but preserves good cache locality and pipeline behavior since there aren't any indirections when accessing generics.

Since C doesn't have this feature, people sometimes reach for the preprocessor to implement it themselves. You can write some tasteful macros like this (ignoring token-pasting problems):

        #define MAKE_FOO_TYPE(T) \
            struct foo_with_ ## #T { \
                T a; \
                int b; \
                int c; \
            }; \

and then define further macros which implement operations on the macro-generated queue types. This is basically a manual implementation of monomorphization using the preprocessor, and it does work but it doesn't look too pretty in use. You also need to bear in mind one of the ugly details of monomorphization, whether the compiler does it or the programmer does: if you are using a C-style linking model, each function definition can only exist in one translation unit, so you need to somehow pick one translation unit to define (say) foo_with_int, and then every other translation unit needs to call it - or you need to always inline the body of foo_with_int, which makes the code size problem imposed by monomorphization even worse.

I was confronted by this problem for my Advent of Code libraries. I didn't want to use the boxing approach because I know some of the problems will require many millions of entries in a generic data structure and I didn't want the overhead of doing separate heap allocations for them, nor of indirecting to access them, and I didn't want to do monomorphization because I think it's ugly without compiler support. I realized, though, that most of the types I needed to use with these generic data structures are pretty small - the biggest one I regularly need is a tuple of (3d point, integer value) for things like a 3-dimensional flood fill or similar. That doesn't solve 100% of my use cases, since there are a couple where I need my data structures to hold strings or whatever, but it does solve most of them. I realized that I can make a structure that can hold most of the values I need, then write all my data structures in terms of that.

That led me to define a type I called "Key", for want of a better name:

        typedef struct {
	              union {
		                    uint8_t u8s[16];
		                    int32_t i32s[4];
		                    int64_t i64s[2];
		                    uint64_t u64s[2];
	              } u;
        } Key;

A Key is 16 bytes of arbitrary data, which can also be viewed as four 32-bit ints, two 64-bit signed ints, or two 64-bit unsigned ints. I then defined my queue type, my hashtable type, my graph type and so on all in terms of Keys, so (for example) my queue interface looks like this:

        extern void queue_push(Queue *q, const Key k);
        extern bool queue_peek(const Queue *q, Key *k);
        extern Key queue_xpeek(const Queue *q);
        extern bool queue_pop(Queue *q, Key *k);
        extern Key queue_xpop(Queue *q);

Aside: I chose 16 bytes specifically because that's the largest a struct can be while still being passed in a register on x86-64. Keeping Keys small enough to go in registers makes it very cheap to pass them around by value.

Second aside: The _xpeek and _xpop functions are an instance of a pattern I use a lot: if there's an x prefixed to the verb part of a function's name, that function unconditionally succeeds by crashing the program if its required preconditions aren't true. In this example, queue_pop() returns whether it could pop a Key or not, while queue_xpop() either pops a Key or crashes the program. If you know your preconditions do hold, this can give you a nice efficiency win because you don't need to pass a pointer to an out-value and have the callee write through it. In another language you could have queue_pop() instead return a pair of (bool, Key) and have both return values in registers, but C doesn't have an ergonomic way to express that.

Now, if I need to put a Point3 into a Queue, all I have to do is pack that Point3 into a Key (which can be done with a short inlined function) and go, then unpack it back into a Point3 on the other side. Of course, if I need to put something into a Key that doesn't fit in it (like a string) then I have to instead stash a pointer to the string inside the Key.

There's one other ingredient I need. In a lot of places I can treat a Key as being just an array of 16 bytes, but some data structures (especially hash tables) require a couple of other operations: hashing and equality. The user of the hash table, who knows what the data stored inside the Key actually means, has to supply these functions. They're expressed through a struct of function pointers:

        typedef struct {
        	bool (*eq)(const Key k0, const Key k1);
        	uint64_t (*hash)(const Key k);
        	void (*destroy)(Key k);
        } Keyfuncs;

so when constructing a hashtable, you pass in a Keyfuncs telling the hash table how to compare the Keys you give it for equality and so on. The destroy() function is a bit of a hack, to deal with the fact that in one problem I needed the Key to own a heap-allocated string, and when removing it from a hash table I needed it to be destroyed so I wouldn't leak memory.

If you step back a little, this basically is the boxing solution to polymorphism, including something like a vtable for dynamic dispatch of operations on the contained polymorphic type - I just made the "box" able to usually contain the data in line rather than via an indirection, so that I can avoid the efficiency loss for most of the types I need to deal with. This ended up working very well for me; the Key type is the basis of all my generic data structures for Advent of Code, and it seems to yield efficient interfaces that aren't too difficult to handle, especially when you have value-to-value functions like this:

        Key point3_tokey(Point3 p) {
              	Key k = KEY_ZERO;
              	k.u.i32s[0] = p.x;
              	k.u.i32s[1] = p.y;
              	k.u.i32s[2] = p.z;
              	return k;
        }

so you can write stuff like: if (hash_hash(h, point3_tokey(p))) { ... }.

Anyway, that's it for this topic - thanks for reading, as always! If you're interested in looking at all of my library code for Advent, you can find it on sourcehut, although it is a bit of a mess currently :)