/* gpm.c - General Purpose Macrogenerator, in C
 *
 * This post is actually just a program, heavily annotated with my commentary.
 * You can compile and maybe run it, although it does not work reliably or well.
 *
 * Build: cc -o gpm gpm.c # but please don't
 * Run: gpm < infile > outfile
 *
 * This is an implementation of Strachey's "General Purpose Macrogenerator", a
 * now-dead macro system, translated from the CPL version in the 1960s paper by
 * Strachey into unbelievably horrible C. This is basically a straight
 * translation with minimal changes to the organization of the code. When
 * reading it, remember that this is from 5 years before Dijkstra's infamous
 * rant about GOTO.
 *
 * When reading the C implementation or the original CPL from the paper, it
 * helps to bear in mind a saying, evidently common around Cambridge in the 60s:
 *
 *   "Watch this, sucker! A lunatic does not behave!" -- Ray Smuckles
 *
 * Alright. There's no way to put this off any longer, so let's get into GPM.
 * GPM is a macro system, which expands macros that appear in its input. Macro
 * definitions *themselves* appear in the input stream, which is both useful and
 * horrible, and IMO is a big part of why these macro systems more or less died
 * out except for m4 - it's way too easy to accidentally invoke a macro, and you
 * cannot feed "untrusted" input into these programs because any input can
 * contain macro definitions! The syntax is very obscure by modern standards,
 * but it looks like this:
 *
 *   $DEF,A,<B~1>;      <---- macro definition: A expands to B~1, where ~1 is
 *                            the first argument
 *   $A,C;              <---- macro expansion: invoke A with argument C, so
 *                            this produces: BC
 *
 * Now, the implementation.
 *
 * There are some "globals" that appear in the original paper. First, there
 * are the IO stack depths:
 *
 *   H      the output location on the stack
 *   C      the input location on the stack
 *
 * If either of these is 0, the corresponding operation uses I/O directly.
 * Speaking of which, both of those operations make use of another global, so
 * let's introduce...
 *
 *   A      the accumulator
 *
 * which deserves some kind of award for making the code hard to follow. This is
 * basically used as a scratch register all over the place, for both arguments
 * to and returns from functions.
 *
 * There are also:
 *
 *   ST      the stack
 *   S       the stack top
 *
 * which work sort of how you expect, and there are:
 *
 *   W       string base index (temporary)
 *   W1      individual digits during decimalization, and grotesquely reused
 *           as a temporary in Monitor11
 *
 * and:
 *
 *   E       stack index of the head of the definition list
 *   P       stack index of the head of the in-progress call list (kind of like
 *           the return stack on modern systems)
 *   F       stack index of the head of the pending call list
 *
 * which are used to produce a small handful of linked lists threaded through
 * the stack.
 *
 * Speaking of that stack... it stores a wild mix of data as the algorithm
 * works, but one thing to look out for are the word definitions, which look
 * like:
 *
 *   1 word: stack index of next word definition, or -1
 *   1 word: number of characters in definition name + 1
 *   n words: characters of definition name
 *   1 word: stack index of start of word's body, or a negative number, which
 *           indexes the "machine macro" table
 *
 * The "machine macros" are what a modern programmer would call "builtins", and
 * are implemented, as they were in the original paper, via computed goto.
 *
 * One other reading note: the various "monitor" things were supposed to trap
 * into a hypothetical machine monitor (think ROM debugger), but I don't think
 * such a thing ever existed. This implementation has them behave as they do in
 * the paper.
 *
 * Some, uh, implementation notes:
 *
 * Stack overflows are handled via assert().
 *
 * The original CPL version used nested routines very heavily, and had a great
 * deal of state shared through a bunch of variables with single-letter names.
 * The closest I can get to that here is, regrettably, cpp macros. Unfortunately
 * the C preprocessor leaves much to be desired, so the Find function has
 * acquired a hacky second argument used to unique-ify a label in its body. That
 * could be avoided by using 'break' instead of 'goto out' in Find(), but that
 * would be the coward's way out.
 *
 * In general anything prefixed with _ doesn't appear in the original CPL, but I
 * ended up needing to add it here, usually to work around language mismatches.
 * The unused _Dump() function can be a lifesaver if you're debugging an error
 * in this code, although I really advise against doing so.
 *
 * CPL supports simultaneous assignment, meaning this:
 *   a, b := b, a
 * swaps the values of a and b. Since C does not have this I sometimes had to
 * resort to temporaries.
 *
 * The original code had no concept of EOF, probably because one simply stopped
 * typing and the program stopped executing. I added EOF handling to the
 * implementation.
 *
 * Note that in some error cases, this implementation (and I think the original
 * paper algorithm) will loop infinitely.
 *
 * The unicode section marker was replaced with '$' here for avoidance of UTF-8
 * handling.
 *
 * For a fun time, try it out:
 *   $DEF,A,<A~1A>;
 *   $DEF,B,<B$A,X~1X;B>;
 *   $DEF,APA,<P~1~1P>;
 *
 *   $A,C;
 *   $B,C;
 */

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>

int ReadSymbol() { return getchar(); }
void WriteSymbol(int a) { putchar(a); }

#define Load \
	if (H == 0) \
		WriteSymbol(A); \
	else \
		assert(S <= n); \
		ST[S++] = A;

#define NextCh \
	if (C == 0) \
		A = ReadSymbol(); \
	else \
		A = ST[C++];

/* This macro takes a stack index x and searches the definition list threaded
 * through the stack, starting with E (the definition stack head), for a
 * definition whose name matches the string starting at index x on the stack.
 *
 * The _I argument is used to deal with a mismatch between CPL and C, since C
 * has nothing like CPL's "nested routines".
 */
#define Find(x,_I) \
	A = E; W = x; \
	while (A >= 0) { \
		for (r = 0; r <= ST[W] - 1; r++) { \
			if (ST[W + r] != ST[A + r + 1]) break; \
		} \
		if (r > ST[W] - 1) { \
			W = A + 1 + ST[W]; \
			goto find_ ## _I; \
		} \
		A = ST[A]; \
	} \
	find_ ## _I: \
	if (A < 0) goto Monitor7;

/* Of course there are control flow macros. Why would there not be? */
#define JumpIfMarked(x) \
	if (x < 0) goto *MachineMacro[-x];

int Number(int x) { return x - '0'; }
int Char(int x) { return x + '0'; }

#define Marker 1048576

void _Dump(int *ST, int S) {
	printf("Dumping %d:\n", S);
	for (int i = 0; i < S; i += 4) {
		printf("%02d    % 4d % 4d % 4d % 4d\n",
		       i, ST[i], ST[i + 1], ST[i + 2], ST[i + 3]);
	}
	printf("\n");
}

/* As a small protest in the name of sense, I avoided defining Item as a long
 * macro, since it really does not mutate caller state. However, I did define an
 * Item() macro below so that the code will read more like the original. */
void _Item(int *ST, int S, int A, int x) {
	int _b = ST[x] == 0 ? S - x - 1 : ST[x] - 1;
	int k;
	for (k = 1; k <= _b; k++) {
		A = ST[x + k];
		WriteSymbol(A);
	}
	if (ST[x] == 0)
		puts("...\t(Incomplete)");
}

#define Item(x) _Item(ST, S, A, x)

void GPM(int n) {
	int A, W;
	int H = 0, P = 0, F = 0, C = 0;
	int S = 39, E = 33, q = 1;
	int k, r, x, W1, a, h, _b, _t0, _t1, _t2;
	int *ST = malloc(n * sizeof(*ST));

	/* Computed goto table for the "machine macros". */
	static const void *MachineMacro[7] = {
		NULL, /* Formarray [label, (1, 6)] starts index at 1 */
		&&DEF, &&VAL, &&UPDATE, &&BIN, &&DEC, &&BAR,
	};

	/* These are the initial word definitions, which will be copied into the
	 * initial stack. The hardcoded indexes are reproduced from the original
	 * paper, which uses them throughout. */
	static const int MST[39] = {
		-1, 4, 'D', 'E', 'F', -1,
		 0, 4, 'V', 'A', 'L', -2,
		 6, 7, 'U', 'P', 'D', 'A', 'T', 'E', -3,
		12, 4, 'B', 'I', 'N', -4,
		21, 4, 'D', 'E', 'C', -5,
		27, 4, 'B', 'A', 'R', -6
	};

	for (k = 0; k < 39; k++)
		ST[k] = MST[k];

Start:  NextCh
	if (A == '<') {
		q++;
		goto Q2;
	}

	switch (A) {
		case '$': goto Fn;
		case ',': goto NextItem;
		case ';': goto Apply;
		case '~': goto LoadArg;
		case Marker: goto EndFn;
		case '>': goto Exit;
		case EOF: return;
		default: goto Copy;
	}

	/* This is exactly how the original paper reads. Dijkstra was right. */
Copy: 	Load
Scan: 	if (q == 1) goto Start;
Q2:	NextCh
	if (A == '<') {
		q++;
		goto Copy;
	}
	if (A != '>') goto Copy;
	q--;
	if (q == 1) goto Start;
	goto Copy;

Fn:
	/* This starts a new pending function call. Pending call frames are
	 * 4 words long, but don't actually use all of those words; they are
	 * padded to the same size as in-progress call frames so that the same
	 * frame can be unlinked from the pending chain, filled in, and relinked
	 * to the in-progress chain. The 0s here are the padding words. */
	ST[S] = H;
	ST[S + 1] = F;
	ST[S + 2] = 0;
	ST[S + 3] = 0;
	F = S + 1;
	H = S + 3;
	S = S + 4;
	goto Start;

NextItem:
	if (H == 0) goto Copy;
	ST[H] = S - H - ST[H];
	ST[S] = 0;
	H = S;
	S++;
	goto Start;

Apply:
	/* Apply a function to arguments, by unlinking the pending call frame,
 	 * filling in an in-progress call frame (I think), and then returning to
 	 * the input loop. I *believe* this will cause the input loop to read
 	 * from the in-progress call's body text, effectively "expanding" the
 	 * macro in place, but because we're back in the main in-progress loop,
 	 * new definitions can be encountered here. */
	if (P > F) goto Monitor1;
	if (H == 0) goto Copy;

	_t0 = ST[F - 1];
	_t1 = ST[F];
	ST[F - 1] = S - F + 2;
	ST[F] = P;
	ST[F + 1] = C;
	ST[H] = S - H;
	ST[S] = Marker;
	P = F;
	H = _t0;
	F = _t1;
	S++;
	if (H != 0)
		ST[H] = ST[H] + ST[P - 1];
	Find(P + 2, Apply)
	JumpIfMarked(ST[W])
	C = W + 1;
	goto Start;

LoadArg:
	/* This pulls a numbered macro argument out of the most recent
	 * in-progress call frame and dumps it into the output. */
	if (P == 0) {
		if (H == 0) goto Copy;
		goto Monitor2;
	}
	NextCh
	W = P + 2;
	if (Number(A) < 0) goto Monitor3;
	for (r = 0; r < Number(A); r++) {
		W = W + ST[W];
		if (ST[W] == Marker) goto Monitor4;
	}
	for (r = 1; r < ST[W]; r++) {
		A = ST[W + r];
		Load
	}
	goto Start;

EndFn:
	/* This function terminates an in-progress function call. I think a lot
	 * of the complexity is because the it shifts stack contents down into
	 * the now-freed space, but I am not entirely confident. */
	if (F > P) goto Monitor5;
	ST[S] = E;
	A = S;
	while (ST[A] >= P - 1 + ST[P - 1]) {
		ST[A] = ST[A] - ST[P - 1];
		A = ST[A];
	}
	W = ST[A];
	while (W > P - 1) {
		W = ST[W];
	}
	ST[A] = W;
	E = ST[S];

	if (H != 0) {
		if (H > P)
			H = H - ST[P - 1];
		else
			ST[H] = ST[H] - ST[P - 1];
	}

	C = ST[P + 1];
	A = P - 1;
	W = P - 1 + ST[P - 1];
	S = S - ST[P - 1];
	P = ST[P];
	while (A != S) {
		ST[A++] = ST[W++];
	}
	goto Start;

Exit:
	if (C != 0 || H != 0) goto Monitor8;
	return;

DEF:
	/* The "machine code" to link a new definition into the definition list,
	 * with the head of the new frame at P + 5 (which I believe is the start
	 * of the arguments). */
	if (H != 0)
		ST[H] = ST[H] - ST[P - 1] + 6;
	ST[P - 1] = 6;
	ST[P + 5] = E;
	E = P + 5;
	goto EndFn;

VAL:
	/* This lets you expand a macro a single time, so basically look up its
	 * definition and emit the literal definition without any re-expansions.
	 */
	Find(P + 6, VAL)
	while (ST[W + 1] != Marker) {
		A = ST[W + 1];
		W++;
		Load
	}
	goto EndFn;

UPDATE:
	/* This modifies the body of an existing macro definition. You cannot
	 * grow a macro's definition beyond its original size this way. */
	Find(P + 9, UPDATE)
	A = P + 9 + ST[P + 9];
	if (ST[A] > ST[W]) goto Monitor9;
	for (r = 1; r <= ST[A]; r++)
		ST[W + r] = ST[A + r];
	goto EndFn;

BIN:
	/* binary -> decimal */
	W = 0;
	A = (ST[P + 7] == '+' || ST[P + 7] == '-') ? P + 8 : P + 7;
	while (ST[A] != Marker) {
		x = Number(ST[A]);
		if (x < 0 || x > 9) goto Monitor10;
		W = 10 * W + x;
		A++;
	}
	ST[S] = (ST[P + 7] == '-' ? -W : W);
	S++;
	goto EndFn;

DEC:
	/* decimal -> binary */
	W = ST[P + 7];
	if (W < 0) {
		W = -W;
		A = '-';
	}
	while (W1 >= 1) {
		while (10 * W1 <= W)
			W1 = 10 * W1;
		A = Char(W / W1);
		W = W % W1;
		W1 = W1 / 10;
		Load
	}
	goto EndFn;

BAR:
	/* binary arithmetic! */
	W = ST[P + 9];
	A = ST[P + 11];
	switch (ST[P + 7]) {
		case '+': A = W + A; break;
		case '-': A = W - A; break;
		case '*': A = W * A; break;
		case '/': A = W / A; break;
		default: A = W % A; break;
	}
	Load
	goto EndFn;

/* These Monitor* labels are error handling. */
Monitor1:
	puts("\nMONITOR: Unmatched semicolon in definition of ");
	Item(P + 2);
	puts("\nIf this had been quoted the result would be\n");
	goto Copy;

Monitor2:
	puts("\nMONITOR: Unquoted tilde in argument list of ");
	Item(F + 2);
	puts("\nIf this had been quoted the result would be\n");
	goto Copy;

Monitor3:
	puts("\nMONITOR: Impossible argument number in definition of ");
	Item(P + 2);
	goto Monitor11;

Monitor4:
	puts("\nMONITOR: No argument");
	H = 0;
	Load
	puts("in call for ");
	Item(P + 2);
	goto Monitor11;

Monitor5:
	puts("\nMONITOR: Terminator in ");
	if (C == 0) {
		puts("input stream. Probably machine error.");
		goto Monitor11;
	}
	puts("argument list for ");
	Item(F + 2);
	puts("probably due to a semicolon missing from the definition of ");
	Item(P + 2);
	puts("\nIf a final semicolon is added the result is\n");
	C--;
	goto Apply;
	
Monitor7:
	puts("\nMONITOR: Undefined name ");
	Item(W);
	goto Monitor11;

Monitor8:
	puts("\nMONITOR: Unmatched >. Probably machine error.");
	goto Monitor11;

Monitor9:
	puts("\nMONITOR: Update argument too long for ");
	Item(P + 9);
	goto Monitor11;

Monitor10:
	puts("\nMONITOR: Non-digit in number");
	goto Monitor11;

Monitor11:
	W = 20;
	puts("\nCurrent macros are");
	while (P != 0 || F != 0) {
		if (P > F) {
			W1 = P + 2;
			P = ST[P];
			puts("\nAlready entered");
		} else {
			W1 = F + 2;
			P = ST[F];
			puts("\nNot yet entered");
		}
		for (r = 1; r <= W; r++) {
			Item(W1);
			if (ST[W1] == 0) break;
			W1 = W1 + ST[W1];
			if (ST[W1] == Marker) break;
			if (W != 1)
				printf("\nArg %d\t", r);
		}
		W = 1;
	}
	puts("\nEnd of monitor printing");
	A = 'Q';
	Load
	if (P > F) goto EndFn;
	goto Start;
}

/* ... and it's a C program so here we go! */

int main() {
	GPM(65536);
}

/* I do want to clarify, since I was pretty harsh in a lot of my comments here,
 * that most of what I'm complaining about are artifacts of mismatches between
 * the capabilities of CPL and C, which forced me into awkward hacks. Also, the
 * original paper's algorithm predates the idea of "structured programming" and
 * is contemporaneous with like, ALGOL-60, so it's hard to blame the author for
 * it not exactly being super elegant.
 *
 * The original algorithm is actually quite clever, and though it does take a
 * while to figure out, the amount of power achieved with a relatively small
 * amount of code and *no* library support is pretty impressive.
 *
 * I'll end this post with a quote from the paper:
 *
 *   "It has been our experience that the GPM, while a very powerful tool
 *    in the hands of a ruthless programmer, is something of a trap for the
 *    sophisticated one. It contains in itself all the undesirable features
 *    of every possible machine code - in the sense of inviting endless
 *    tricks and time-wasting though fascinating exercises in ingenuity -
 *    without any of the irritating ad hoc features of real machines. It
 *    can also be almost impenetrably opaque, and even very experienced
 *    programmers indeed tend to spend hours simulating its action when one
 *    of their macro definitions goes wrong."
 *
 * The paper is quite good reading even though this kind of langauge turned out
 * to be more or less a dead end, except for m4 and the C preprocessor. You can
 * find the paper around the internet still; my copy has sha256:
 *   3479413232df3c2302b12a225aa261ebba1f21fe7aacf3611b7bda0b8f67b2ff
 *
 * That's it for now. Thanks for reading :)
 *
 * $#t General Purpose Macrogenerator
 * $#s A rewrite of the historic GPM in C, slightly touching on differences
 * $#s between C and CPL
 * $#o c, compilers, history, programming
 * $#u 2f258466-e6b6-4f3d-b4a9-68c75dc59901
 */