## Linda Tuple Spaces Linda Tuple Spaces (named after the Linda language, which was their first practical implementation) are a parallel/distributed programming architecture. In a tuple space system, processes that are doing computations communicate through a "tuple space", which is a multiset-like data structure containing typed tuples of values. The tuple space itself may be hosted in a server process, stored in some kind of shared memory or database, or in any other convenient place - it's just a descriptive metaphor. For example, here's an empty tuple space: ## Put The first fundamental operation of tuple spaces is "put", by which tuples enter the space. If we do: put ("foo", 1, false) then the space will look like: ("foo", 1, false) and if we then do: put ("foo", 2, true) then the space will be: ("foo", 1, false) ("foo", 2, true) note that the space is a multiset, which specifically means: * Since it is a set, it is not ordered in any way * Since it is a multiset, it can contain duplicate values ## Take & Copy The next two fundamental operations of a tuple space are "take" and "copy". Both of them retrieve a value from the tuple space, but "take" also deletes the value afterwards. The "take" operation is atomic: if two processes both try to "take" the same value, only one of them will get it. Both of these operations block until they can succeed, although they can also have non-blocking variants. How, though, does one specify which tuple to take? Obviously being able to take only a tuple that one already knows in its entirety is of limited value. Instead, both of these operations take a "selector", which identifies a multiset of tuples in the space; one of these tuples is returned. For example, if the selector is: (string "foo", int 1, bool *) where * means "any value here", then the returned tuple from "take" would be: ("foo", 1, true). In practice, since there's no need to return to the caller the values that are already known, all that would be returned here is: (true) If the selector was instead: (string *, int 1, bool *) then the returned value would be: ("foo", true) ## Eval What we've outlined above makes a serviceable (and maybe pretty cool) system for sharing data, but it doesn't get at how processes are created at all. The last tuple space primitive takes care of that. By using "eval", a process may create another process (a "live tuple"), which runs to completion and then turns into a regular old data tuple. For example: eval (factorial, 5) would insert a "live tuple" containing the *computation* "factorial" and the integer 5 into the space. At its leisure, something in the space will take that computation (deleting it from the space), execute it, and eventually put the computation's result tuple: (120) Exactly what the "something" is is implementation-defined, as is the means of specifying a computation to execute, but there are some obvious possibilities: * A path to a program, which is executed with the tuple supplied to it and whose outputs are used as its result * A chunk of source code in any language one likes * A HTTP URL, to which the supplied tuple is posted, with the HTTP response used as its result and so on. ## In Use Here's an imaginary API for dealing with a tuple space in C: int tuple_put(const char *sig, ...); int tuple_get(const char *sig, ...); int tuple_take(const char *sig, ...); int tuple_eval(const char *comp, const char *sig, ...); Note that the eval operation takes a string representing the computation. The format strings are used to supply types for the tuple elements. We might define format specifiers like: s/S = variable string/fixed string i/I = variable int/fixed int b/B = variable bool/fixed bool where the distinction between a variable T and a fixed T is only relevant when selecting - a "fixed T" means the caller wants a literal match, and a variable T means the caller will accept any value but wants it returned. Using this API, process 1 might do: tuple_put("sib", "foo", 1, false); and process 2 might do: bool b; tuple_copy("SIb", "foo", 1, &b); or perhaps: bool b; int i; tuple_copy("Sib", "foo", &i, &b); The "eval" operation is a bit strange. We could *almost* implement it as a regular client of this IPC system, like so: while (tuple_take("Ss?", "eval", &prog, &args)) tuple_put("?", execve(prog, args)) except that it is impossible to properly give the type of args in this sytem - args itself is a tuple. Rather than introduce the ability to nest tuples directly into the tuple space's data model, we can pass the eval operation a serialized form of the provided tuple in some agreed-upon format, and have it return a serialized tuple in that same format. It is fine for us to reuse whatever wire format we decide to use when normally communicating between the IPC clients and servers. The eval service then looks like: while (tuple_take("Sss", "eval", &prog, &args) { (type, result) = execve(prog, deserialize(args)) tuple_put(type, result) } A function call/result wrapper could be: int tuple_call(const char *insig, const char *outsig, ...) which would be a fused pair of: tuple_put(insig, ...) tuple_take(outsig, ...) pulling the appropriate fields out of varargs each time. For example: tuple_call("si", "i", "factorial", 5, &i) would mean: tuple_put("si", "factorial", 5) tuple_take("i", &i) An obvious problem obtains with this scheme though - this will grab a randomly-selected int from the tuple space, not necessarily our result! Instead, tuple_call() needs to do this: u = uuidgen() tuple_put("ssi", u, "factorial", 5) tuple_take("Si", u, &i) This pattern is likely to be quite common, so it probably makes sense to have this notion of call/reply unique IDs directly expressible in the tuple signature. ## Lifetimes One thing I've glossed over here is garbage collection. As described, any tuple inserted into the system is retained forever until taken from it. That's obviously no good for real use - suppose a program puts a tuple requesting something from a service, then crashes? The result will sit around indefinitely waiting to be collected. To fix this we need to introduce tuple lifetimes. There is at least one kind of useful lifetime: "For n seconds", which allows function calls/replies to be properly transient and for signalling of events. Another interesting lifetime might be "for as long as this client is connected", which opens up the prospect of service discovery. ## Next I'd like to write an implementation of the concepts and design sketch given above. Stay tuned for that :)