## Linda Tuple Spaces

Linda Tuple Spaces (named after the Linda language, which was their first
practical implementation) are a parallel/distributed programming architecture.
In a tuple space system, processes that are doing computations communicate
through a "tuple space", which is a multiset-like data structure containing
typed tuples of values. The tuple space itself may be hosted in a server
process, stored in some kind of shared memory or database, or in any other
convenient place - it's just a descriptive metaphor.

For example, here's an empty tuple space:

## Put

The first fundamental operation of tuple spaces is "put", by which tuples enter
the space. If we do:

	put ("foo", 1, false)

then the space will look like:

	("foo", 1, false)

and if we then do:

	put ("foo", 2, true)

then the space will be:

	("foo", 1, false)
	("foo", 2, true)

note that the space is a multiset, which specifically means:

* Since it is a set, it is not ordered in any way
* Since it is a multiset, it can contain duplicate values

## Take & Copy

The next two fundamental operations of a tuple space are "take" and "copy". Both
of them retrieve a value from the tuple space, but "take" also deletes the value
afterwards. The "take" operation is atomic: if two processes both try to "take"
the same value, only one of them will get it. Both of these operations block
until they can succeed, although they can also have non-blocking variants.

How, though, does one specify which tuple to take? Obviously being able to take
only a tuple that one already knows in its entirety is of limited value.
Instead, both of these operations take a "selector", which identifies a
multiset of tuples in the space; one of these tuples is returned.

For example, if the selector is:

	(string "foo", int 1, bool *)

where * means "any value here", then the returned tuple from "take" would be:

	("foo", 1, true).

In practice, since there's no need to return to the caller the values that are
already known, all that would be returned here is:

	(true)

If the selector was instead:

	(string *, int 1, bool *)

then the returned value would be:

	("foo", true)

## Eval

What we've outlined above makes a serviceable (and maybe pretty cool) system for
sharing data, but it doesn't get at how processes are created at all. The last
tuple space primitive takes care of that. By using "eval", a process may create
another process (a "live tuple"), which runs to completion and then turns into a
regular old data tuple. For example:

	eval (factorial, 5)

would insert a "live tuple" containing the *computation* "factorial" and the
integer 5 into the space. At its leisure, something in the space will take that
computation (deleting it from the space), execute it, and eventually put the
computation's result tuple:

	(120)

Exactly what the "something" is is implementation-defined, as is the means of
specifying a computation to execute, but there are some obvious possibilities:

* A path to a program, which is executed with the tuple supplied to it and whose
  outputs are used as its result
* A chunk of source code in any language one likes
* A HTTP URL, to which the supplied tuple is posted, with the HTTP response used
  as its result

and so on.

## In Use

Here's an imaginary API for dealing with a tuple space in C:

	int tuple_put(const char *sig, ...);
	int tuple_get(const char *sig, ...);
	int tuple_take(const char *sig, ...);
	int tuple_eval(const char *comp, const char *sig, ...);

Note that the eval operation takes a string representing the computation.

The format strings are used to supply types for the tuple elements. We might
define format specifiers like:

	s/S = variable string/fixed string
	i/I = variable int/fixed int
	b/B = variable bool/fixed bool

where the distinction between a variable T and a fixed T is only relevant when
selecting - a "fixed T" means the caller wants a literal match, and a variable T
means the caller will accept any value but wants it returned.

Using this API, process 1 might do:

	tuple_put("sib", "foo", 1, false);

and process 2 might do:

	bool b;
	tuple_copy("SIb", "foo", 1, &b);

or perhaps:

	bool b;
	int i;
	tuple_copy("Sib", "foo", &i, &b);

The "eval" operation is a bit strange. We could *almost* implement it as a
regular client of this IPC system, like so:

	while (tuple_take("Ss?", "eval", &prog, &args))
		tuple_put("?", execve(prog, args))

except that it is impossible to properly give the type of args in this sytem -
args itself is a tuple. Rather than introduce the ability to nest tuples
directly into the tuple space's data model, we can pass the eval operation a
serialized form of the provided tuple in some agreed-upon format, and have it
return a serialized tuple in that same format. It is fine for us to reuse
whatever wire format we decide to use when normally communicating between the
IPC clients and servers. The eval service then looks like:

	while (tuple_take("Sss", "eval", &prog, &args) {
		(type, result) = execve(prog, deserialize(args))
		tuple_put(type, result)
	}

A function call/result wrapper could be:

	int tuple_call(const char *insig, const char *outsig, ...)

which would be a fused pair of:

	tuple_put(insig, ...)
	tuple_take(outsig, ...)

pulling the appropriate fields out of varargs each time. For example:

	tuple_call("si", "i", "factorial", 5, &i)

would mean:

	tuple_put("si", "factorial", 5)
	tuple_take("i", &i)

An obvious problem obtains with this scheme though - this will grab a
randomly-selected int from the tuple space, not necessarily our result! Instead,
tuple_call() needs to do this:

	u = uuidgen()
	tuple_put("ssi", u, "factorial", 5)
	tuple_take("Si", u, &i)

This pattern is likely to be quite common, so it probably makes sense to have
this notion of call/reply unique IDs directly expressible in the tuple
signature.

## Lifetimes

One thing I've glossed over here is garbage collection. As described, any tuple
inserted into the system is retained forever until taken from it. That's
obviously no good for real use - suppose a program puts a tuple requesting
something from a service, then crashes? The result will sit around indefinitely
waiting to be collected.

To fix this we need to introduce tuple lifetimes. There is at least one kind of
useful lifetime: "For n seconds", which allows function calls/replies to be
properly transient and for signalling of events.

Another interesting lifetime might be "for as long as this client is connected",
which opens up the prospect of service discovery.

## Next

I'd like to write an implementation of the concepts and design sketch given
above. Stay tuned for that :)

$#t Linda Tuple Spaces
$#s A cool data structure for distributed programming
$#o computer-science
$#u 0a802f02-d4df-442d-83c4-64fa1360db61