A "Spellserver" is my name for a very generic server design: one in which the server-side code is provided by the clients (but somehow constrained by the server owner). I've been exploring ways to achieve E-like delegation and attenuation-by-code, in which I can write a program to enforce my restrictions (rather than pre-programming the server with all conceivable ones). E provides this quite elegantly, but depends upon a number of unusual platform details, many of which have not actually been implemented yet. I'm looking for a way to achieve similar goals but in a familiar language like Javascript, and invoked via plain HTTP.

Background

What is E? What's so special about it? Which of its properties am I trying to achieve in something else?

The E programming language was developed to explore object-capability -based programming patterns. It has been the testbed for Eventual-Sends, Promises, Promise Pipelining, Proxies, Facets, Rights-Amplification, Auditors, and other key ideas of the objcap movement.

"Dr. SES" is an ongoing project to bring E's concepts to Javascript, making them accessible to a larger community. Ten years of TC39 standardization work have lead to ECMAScript being a reasonable target for secure object-capability -based programming.

E's Abstract Ideas

In E, objects contain references to other objects, which can be used to invoke methods on those targets (and in fact are the only way to communicate with them). These method invocations include arguments, and these arguments can include other object references. Each method can also return data (or a Promise to some data), including object references.

Object references can span multiple computers. The "Vat" abstraction is a collection of objects which can make "local" (immediate, blocking) calls to each other. When the target object is in a foreign Vat, any method calls must use the "eventual send" operator, which returns a Promise for the result, and guarantees that target will be unable to affect any execution until the caller's stack frame has been unwound. In practice, each Vat runs on a single computer.

The abstract Vat is immortal: it never shuts down, and any object created in one will be persisted forever (or until all references to it are lost, at which point its continued existence is indistinguishable from its destruction).

The Vat basically contains a persistent object store (indexed by some inter-Vat identifier scheme), a standard execution stack, and a queue of inbound messages (either generated from remote Vats or from local eventual-send operations). Each immediate method call pushes a new frame on the stack, and yields a return value when it finishes. Each eventual-send pushes a new message on a queue (either local or in some foreign Vat). Eventual-sends provide a Promise to the caller, and include a reference to the corresponding Resolver object as a special argument in the remote message. The Vat knows how to take the return value from a message invocation and deliver it to the Resolver (as a new message).

Realizing E For Real

E is a fully-functional programming environment, implemented as a Java-based interpreter, but for various reasons it hasn't caught on to the extent that Python, Node.js, Go, or Rust have. Some of E's concepts have been brought into the Javascript world, but many are not yet available outside of a real E interpreter. Porting these ideas to a different environment would be pretty useful.

E's programming model imposes a number of requirements on any implementation. The basic ones are the network protocols that define how Vats on different computers should talk to each other. E includes VatTP and CapTP for this purpose, but they're somewhat dated (they use Java serialization, MD5, SHA1, DSA, and RSA).

These protocols are connection-oriented: when a TLS/TCP connection is severed, object references that cross that connection are "broken", and both sides can react to the break (by restarting from some persistent identifier, and rebuilding everything that was lost). The persistent identifiers are called SturdyRefs, and are basically an unguessable string, plus enough crypto bits to identify and reach a specific TLS server.

This leads applications to distinguish between long-lived externally-referenceable objects (which must live forever, because there's no good way to learn when the secret string that grants access has been forgotten by all callers), and ephemeral objects (which can be called remotely, but the only way to reach them is via messages to one of the long-lived objects). These long-lived objects are clear candidates for persistence.

The Waterken system shares much heritage with E, but provides a non-connection-oriented approach which imposes other requirements: try-forever message delivery, and no ephemeral objects (everything gets a persistent reference string).

The most significant constraint of E's model is orthogonal object persistence. Vats pretend to hold all objects in RAM, forever, on a durable and immortal computer, which nominally means everything gets checkpointed after each message is processed. Application code doesn't need to specify what data is persistent and what is ephemeral: in fact objects are generally unaware of being serialized. At each checkpoint, the Vat walks the table that maps external SturdyRefs to local objects, then chases down everything which those objects reference, transitively. The platform must know how to serialize each object, including cycles, platform objects like file handles, and immutable containers that can only be reconstructed in a specific order. E can serialize the code that implements object methods, closures, and whatever other objects were captured from their parent scope. This is important, since much of E's encapsulation is based on lexically-scoped closures which capture (otherwise-unreachable) private state variables.

If we assume that the Vat server stays up most of the time, but is occasionally restarted, then we might persist only the long-lived externally-referenceable objects. Ephemeral objects work normally until the server is stopped, at which point all connections are dropped anyways. Restarting the Vat is equivalent to breaking all external connections, throwing out all objects that aren't reachable by one of the persistent roots, then waiting for external callers to reconnect to a root.

Something Like E, But Different

I want to use something like E, but with a familiar language (Javascript, specifically Dr. SES), and which can run on a regular non-immortal computer. I'd like an HTTP-based non-connection-oriented transport, in which messages can be stored and forwarded from one node to another safely (which implies signatures and encryption, rather than transport-based security).

I don't know how to make Javascript do orthogonal persistence, and I'm not sure it's a good idea anyways (in my experience with Twisted's old "TAP" files, it was awfully easy to accidentally reference non-serializable objects, or very large objects, and saving raw object state interacts badly with attempts to upgrade program code). So I'm interested in something that does non-orthogonal serialization, where all persistent state lives in a single "Memory" object, and application code calls set() and get() to access it.

Offline Uninvolved Attenuated Delegation

The most interesting features of object-capability systems are delegation and attenuation, wherein new kinds of authorities can be created by wrapping and combining existing ones. Security analysis of a system is much easier when you know an upper bound on the power available to any given component, and object references make this pretty easy to reason about.

When Alice holds a reference to some object on Bob's server, she holds the authority to invoke that object. She can delegate that authority to Carol by sharing her object reference (which might just be a secret string). But the more interesting case is where Alice wants to delegate a subset of that authority. The standard ocap way to attenuate authority is to create a wrapper object: the wrapper can apply restrictions on the method being called, or on the arguments passed, or it can simply throw an exception instead of proxying the message.

(I'm being pretty casual about whether "Alice" means the human who runs the computer on which some object lives, or the anthropomorphized object itself.. hopefully this won't cause confusion).

E lets you create these wrappers easily, but by default they live on the local server. In our example, this would result in Carol holding a reference to some wrapper object on Alice's computer, which must be involved whenever Carol wants to invoke that object (this impacts availability, latency, and overall bandwidth cost, as the messages must traverse both the Carol->Alice and Alice->Bob links, instead of just the Carol->Bob link).

To remove Alice's involvement, the wrapper must be created on Bob's Vat: the same Vat as the target object. Creating it on Carol's Vat would be obviously insecure, since she could bypass it any time she wanted. Bob already has that power (since he owns the computer where the target lives), so we don't lose any security by asking Bob to host the wrapper too. To build it there, either Bob's object must offer an explicit eval() method, or E's proposed where() feature must be implemented (a generalized tool for creating an object on the same Vat as some initial target).

In addition, these E wrappers/proxies must be constructed on Bob first, so Alice can acquire a reference to them, which she can later deliver to Carol. Bob must be online to perform attenuated delegation.

I'd like this delegation to be "offline", meaning that Alice can perform this attenutation without talking to anyone else (Bob in particular), I'd also like Alice to be uninvolved with the eventual exercise of this delegated authority. And I'd like the wrapper to not consume persistent space on Bob's computer: until they are exercised, only Alice should pay the cost of creating and maintaining the wrapper object.

What Is A Server, Anyways?

At a high level of abstraction, all servers are equal. In a zero-latency infinite-bandwidth universally-trusted Internet (presumably managed by frictionless uniform-density spherical cows, favorites of physics students everywhere), it doesn't matter where any object lives. Remote object references are every bit as good as local ones (eventual-send vs immediate-call being the only difference). Messages are executed instantly, regardless of argument size or the complexity of their code.

Once we acknowledge that the speed of light is finite, we start to care about latency. We'll want to move objects closer to the data they need, and we discover some tradeoffs. Caching/mirroring moves the data to the code, which helps if the data is small and the code is large. SQL-style stored procedures and Promise pipelining moves the code to the data, which helps in the opposite situation.

The next stage of descent from the abstractosphere reveals trust issues, where we don't want to grant too much authority to the server that hosts a given object. Any reference we give to an object on Alice's Vat will be available to Alice herself, which we might not like. The subsequent uncomfortable realization regards confidentiality: any information we reveal to Alice's objects are also visible to Alice. This prompts the development of tools which keep certain objects inside a security perimeter: a set of approved Vats.

But the core proposition remains: if you like the server, and you can move code to data or vice versa, then all servers are equal. Today's question is how you get the code there in the first place: how to turn a bare slab of CPU and storage into a server that's useful for your particular purpose.

The traditional non-Vat-ish server (e.g. a LAMP stack) is highly pre-configured. You split most web applications into a client half and a server half, and write a bunch of code that defines how the server operates. This server code might be bundled into a container image, which bottoms out in Linux syscalls and a kernel TCP stack. The admin must provision and launch a server first, then they can tell clients where to find it (as well as granting access credentials).

The Vat-ish approach is for the server admin to evaluate a big chunk of text, which produces a persistent object and a SturdyRef.

The direction I want to explore is this: why should we need to pre-configure the server? Since most API calls boil down to an object identifier, a method name, and some serialized arguments, would it be possible to build the server on the fly in response to (roughly) the same amount of inbound message data?

On-Demand Server Creation

Imagine a very generic server, with just the following API at the root URL path:

@post("/")
def post(body):
    f = eval(body)
    return f()

(my pseudocode uses a Python/Flask-like syntax, but of course we'd use a properly-confined language, where eval() doesn't grant the client-supplied code access to anything else: it can't import a library to write to the filesystem, or make additional network calls. It can think furiously, but has neither voice nor hands.)

This isn't very useful yet. Perhaps if our server has a bigger CPU than the client, this would let them offload some intensive computation. But each request is isolated and independent.

So let's add some memory to this brain:

memory = dict()
@post("/")
def post(body):
    f = eval(body)
    return f(memory)

Now that the client-supplied code can use a basic dictionary/map, they could use it to build a basic object store. One call can leave data for a subsequent one. Entire companies revolve around providing tools like this. The set-side code could implement clever indexing tables, allowing the get-side code to locate data faster. Arbitrary search criteria could be used. This could provide a database with server-side map/reduce functions. Clients look like this:

STORE = '''
def run(memory):
    memory["INDEX"] = "value"
'''

POST(url, body=STORE)

Does it feel wrong to supply the entire server program each time? Let's cache it. First, let's just cache the text of the program. We augment the protocol to make the client normally deliver just the hash, but if the server doesn't recognize that hash, the client makes a second request (to a different endpoint) that includes the complete program. If the client believes that the server doesn't already have a copy, it will premptively fill the cache before sending the invocation.

program_cache = dict()
@post("/cache")
def cache_program(program_hash, program_body):
    if program_hash not in program_cache:
        if not program_body:
            raise NeedProgramBody
        if hash(program_body) != program_hash:
            raise BadHash
        program_cache[program_hash] = program_body

def get_program(program_hash):
    return program_cache[program_hash]

memory = dict()
@post("/")
def http_post(program_hash):
    program = get_program(program_hash)
    f = eval(program)
    return f(memory)

Next, let's cache the compiled/evaluated program too. E defines the notion of Auditors which let one object ask questions about the code inside another object. The DeepFrozen auditor asserts that a given object has no references to any mutable state. Any mutability must be passed into the object on each call. This makes it safe to cache the callable object.

As Clarke's maxim says, sufficiently advanced technology is indistinguishable from magic. These programs define an intricate process, and carry great power when invoked. The hashes are like magic words which uniquely identify a program, so let's call them a spell.

callable_cache = dict()
def get_callable(spell):
    if spell not in callable_cache:
        program_body = get_program(spell)
        f = eval(program_body)
        if not DeepFrozen(f):
            raise ProgramIsNotDeepFrozen
        callable_cache[spell] = f
    return callable_cache[spell]

@post("/")
def http_post(spell):
    f = get_callable(spell)
    return f(memory)

In the subsequent examples, I'll omit the details of this cache, but remain confident that it can work, and with enough engineering, we won't have to re-deliver the whole program every time.

Security of Spellcasting

Obviously, our little server is completely unprotected: anyone on the internet can guess the IP address and port number, and supply programs (aka "cast spells"). And callers are not isolated from each other. Every POST could do anything it wants to the stored data: steal secrets, corrupt state, or just delete it entirely.

All security on the internet is based upon knowledge of secrets, and upon various ways of proving that knowledge to others. Bearer credentials (e.g. API tokens) sound simple, but they require several things: a TLS stack, a live TCP connection, trust in some particular set of CA roots, and the discipline to only reveal the token to the right destination (each secret is scoped to a particular recipient's identity).

Digital signatures are more generic. If we put aside confidentiality for a moment, signed request messages don't require any notion of server identity, and they can be forwarded along arbitrary paths before eventual delivery (making them usable in asynchronous environments, or between machines with intermittent network connectivity). Using them safely requires something to prevent replay attacks, but bearer credentials did too, it's just easy to forget because the obligatory TLS connection generally did that for free.

Signatures involve four things: a Signer (which holds the private key), a Verifier (which holds the public key), the Message, and the Signed Message. Apart from key generation, there are only two operations: the Signer can turn a Message into a Signed Message, and the Verifier can turn a Signed Message into a Message (or throw an exception if the signature was invalid). I learned this API from djb's Ed25519 library, and I like it because it discourages the use of unverified data: the only way to get the Message is to go through the Verifier first. There are situations in which detached signatures make sense, but usually as some sort of optimization.

For our server, the simplest way to limit access to specific clients is to generate a keypair, and embed the public verifier key in the server (either we generate the private signing key and give it to the client, or we have the client generate the signing key and tell us the verifier). Only clients that know the corresponding private key will be able to make calls, and they'll express this knowledge by signing each request.

OWNER_VERIFIER = Verifier("4bc6b2faa20a39acf51e7e..")
memory = dict()

@post("/")
def http_post(signed_spell):
    spell = OWNER_VERIFIER.verify(signed_spell)
    f = get_callable(spell)
    return f(memory)

Now only a client who knows the matching Signer private key can use our storage and compute resources.

OWNER_SIGNER = Signer("3d677d3fa31a8c88..")

STORE = '''
def run(memory):
    memory["INDEX"] = "value"
'''

signed_spell = OWNER_SIGNER.sign(STORE)
POST(url, body=signed_spell)

Multiple Clients

What if we want to have two clients, who use separate resources? Alice has been given free reign on the server, and she'd like to hand out "accounts" to her friends Bob and Bert (who have distinct keys). The simplest way would be to give them isolated dictionaries, and use their signing key to distinguish their requests:

# maps client key to their personal 'memory' dict
clients = {Verifier("70ccf6d1ae2edf896b2c887.."): dict(), # Bob
           Verifier("cd5afa8e4737f4313b0d3fc.."): dict(), # Bert
          }

@post("/")
def http_post(client_key, signed_spell):
    if client_key not in clients:
        raise UnknownClientID
    memory = clients[client_key]
    spell = client_key.verify(signed_spell)
    f = get_callable(spell)
    return f(memory)

But this has two problems:

  • Inflexible: Adding more clients must be done by modifying the server's code, which can only be done by the machine's owner.
  • Isolated: Clients cannot interact with each other, even if they wanted to.

Chained Programs

So how about we have the server accept two programs on each request? The first program is controlled by the owner (Alice), and is signed by her. The second comes from the client (signed by Bob or Bert), and gets all its power from the first. Alice's program gets to do two things:

  • decide which key is allowed to sign the second program
  • create an attenuated power object (a wrapper around memory) to be passed into that second program

We define the first "spell component" to be the signed serialization of an "activation record": a 2-tuple of an attenuation program and a verifying key. The key will be used to check the signature of the next component. We don't currently need a verifying key in the second activation record because there's no third program (yet!).

OWNER_VERIFIER = Verifier("4bc6b2faa20a39acf51e7e..")
memory = dict()

@post("/")
def http_post(body):
    (first_component, second_component) = parse(body)
    first_activation_record = OWNER_VERIFIER.verify(first_component)
    (first_program, second_key) = parse(first_activation_record)
    second_activation_record = Verifier(second_key).verify(second_component)
    (second_program, _ignored) = parse(second_activation_record)
    f1 = eval(first_program)
    f2 = eval(second_program)
    power = f1(memory)
    return f2(power)

When Alice wants to create an account for Bob, she gets a verifying key from Bob, and build a program to attenuate memory down into the Bob-specific subset. She concatenates it with the key we'll use to recognize Bob's program, and signs the result:

OWNER_SIGNER = Signer("3d677d3fa31a8c88..")

LIMIT_TO_BOB = '''
def run(memory):
    power = memory["bob"]
    return power
'''

record = build_record(LIMIT_TO_BOB, "70ccf6d1ae2edf896b2c887")
PREFIX_FOR_BOB = OWNER_SIGNER.sign(record) # this will be 'first_component'

Alice gives this prefix to Bob. Later, when Bob wants to store something in his private memory space, Bob creates a program to use his attenuated object, and signs it with his key:

BOB_SIGNER = Signer("2f0f57f33e71ee58")
STORE_DATA_AT_INDEX = '''
def run(power):
    power["index"] = "data"
'''

record2_store = build_record(STORE_DATA_AT_INDEX, None)
POST(url, body=join(PREFIX_FOR_BOB, BOB_SIGNER.sign(record2_store)))

The server will run limit_to_bob, create a power object (to Alice's original specifications) just for Bob, then pass it to Bob's function (which stores some data). It runs store_data_at_index if and only if it was signed with the key that was delivered next to limit_to_bob.

When Bob wants to retrieve that data, he creates a different program:

def get_index(power):
    return power["index"]

and signs and POSTs it as before, using PREFIX_FOR_BOB again.

When Alice wants to create an account for Bert, she builds a limit_to_bert() that looks just like limit_to_bob except that it requires a different public verifying key, and returns a power that holds a different subset of the memory object:

def limit_to_bert(memory):
    power = memory["bert"]
    return power

Bert gets this new prefix, and uses it for all his requests.

Suppose that Alice owns the execution host. She can write attenuator programs Alice1, Alice2, Alice3, etc, all of which will be accepted and executed by the host, with full power. With each program, she can attach a single identifier that enables a client like Bob to execute a second program with less authority. Bob may have several such programs, all of which are signed (or otherwise protected) by the same value. So if Alice1 is tied to Bob's key, then Alice1.Bob1 and Alice1.Bob2 are valid sub-programs. Alice2 might be tied to Bert's key, enabling Alice2.Bert1 and Alice2.Bert2. But "Alice1.Bert1" would be rejected, because the Alice1 program doesn't authorize Bert's key.

Delegation

Suppose Bob stores a number in his subset of memory, and wants to let Carol increment this value (but never decrement it). If we add a third level of programs, then Bob can get a verifying key from Carol, and give her a second-level prefix which exposes an increment() function:

def limit_to_increment(power):
    def increment(value):
        assert isinstance(value, float)
        assert value >= 0.0
        power["index"] += value
    return increment

and Carol uses this by calling the power object she receives:

def increment_by_5(power):
    power(5.0)

Spells are Arbitrary-Length Lists of Attenuation Programs

We can generalize this, so the server accepts an entire list of signed programs. It lets each one decide whether it likes the following one, executes it to create an attenuated subset of the passed-in power, then gives that subset to the next function. The last function is the only one which actually does anything.

To be specific, a "spell" is a list of signed activation records (the "spell components"). Each record identifies both a program and the verifying key for the next component.

OWNER_VERIFIER = Verifier("4bc6b2faa20a39acf51e7e..")
memory = dict()

@post("/")
def http_post(body):
    signed_records = parse(body)
    next_verifier = OWNER_VERIFIER
    power = memory
    for i in len(signed_records):
        signed_record = signed_records[i]
        record = next_verifier.verify(signed_record)
        program, next_key = parse(record)
        next_verifier = Verifier(next_key)
        f = eval(program)
        if i+1 == len(signed_records):
            # this is the last spell, 'next_key' is None and ignored
            return f(power) # finally invoke the leaf
        else:
            power = f(power) # attenuate power for the next step

All non-leaf programs are run with a power argument, however we require that none of them actually invoke it yet: they may only build wrappers around it. No messages are sent to any power objects until the final leaf function is executed. We might enforce this with an Auditor of some sort, or by wrapping each power object in a proxy that rejects invocation attempts until they are all switched on, just before the final leaf runs.

All programs are DeepFrozen, the only mutability comes from the power argument, and nothing invokes it during construction of the tree, so evaluating these programs cannot make any changes to mutable state. The lack of side-effects makes the results safe to cache, which is a big win.

We'll build a tree cache: a tree in which each path from the root is a list of components. At each node, we'll cache two things. The first is compiled program (the power value) for that path, which will encapsulate a reference to the tree-parent's cached power object (which itself references its own parent, and so on, up to the root, which is just the original Power object, without any code). These power objects get weaker as we get further from the root, since there are more attenuators in the way. The other value at each node is the verifier, which is used to recognize programs that are allowed to use the power object.

Extensions

So that's the basic idea. There are a lot of implementation details that need to be worked out, so stay tuned for more posts on this subject. In particular, there are several properties that we might want to add:

  • Promises: The wrappers should at least be able to use Promises internally, returning a Promise for the result. It'd even be nice to let these Promises span multiple messages, although that probably means we need a way to store the Resolvers in the persistent storage.

  • Confidentiality: In many contexts, we want privacy for the code we're sending and the results we get back. We'd like to encrypt the programs before sending them, in addition to signing them.

  • Caching and API Compression: We don't really want to send every program in the chain, for every call. This can be compressed this by sending hashes instead of programs (and only supplying the programs when necessary). We can compress it further by referring to entire cached chains with a single truncated hash.

  • Evaluation Approaches: There are other possible ways to invoke parent and child programs, giving one the results of the other, which might be more natural for certain delegation patterns.

  • More Power: Memory is the simplest thing we might offer each program, but there are other interesting powers to choose from. The first would be the ability to send messages to other servers, presented as a basic HTTP client, or using some other protocol. We might also allow programs access to a clock, or a random-number generator, enabling non-deterministic behavior. We can use these to prevent replay attacks, or to expire a delegated authority after some amount of time.

  • Rights-Amplification: The chains we've seen so far are purely attenuating: each child program can do no more than its parent. "Rights Amplification" is about combining two program chains together to do more than either one could separately. This will involve a program taking multiple power arguments, each of which derives from a full chain.

  • Creating power in response to the brand: In our current scheme, for Carol to receive power, Bob must create a prefix for her (which references her key). We could instead store information about Carol in a table in the memory, and Bob could write a generic function that would accept messages from any client listed in this table. Bob's function would get to examine the "Brand" of Carol's message, and would attenuate the power differently depending upon what it sees. When the spells are confidential, Bob's program would also be responsible for returning an Unsealer for the client program.