This is part thirteen in a series on the 0.3 version of the language spec for the Merg-E Domain Specific Language for the InnuenDo Web 3.0 stack. I'll add more parts to the below list as the spec progresses:
- part 1 : coding style, files, merging, scoping, name resolution and synchronisation
- part 2 : reverse markdown for documentation
- part 3 : Actors and pools.
- part 4 : Semantic locks, blockers, continuation points and hazardous blockers
- part 5 : Semantic lexing, DAGs, prune / ent and alias.
- part 6 : DAGs and DataFrames as only data structures, and inline lambdas for pure compute.
- part 7 : Freezing
- part 8 : Attenuation, decomposition, and membranes
- part 9 : Sensitive data in immutables and future vault support.
- part 10 : Scalars and High Fidelity JSON
- part 11 : Operators, expressions and precedence.
- part 12 : Robust integers and integer bitwidth generic programming
- part 13 : The Merg-E ownership model, capture rules, and the --trustmebro compiler flag.
- part 14 : Actorcitos and structural iterators
- part 15: Explicit actorcitos, non-inline structural iterators, runtimes, and abstract schedular pipeline.
- part 16 : Async functions and resources and the full use of InnuenDo VaultFS
- part 17 : RAM-Points, RAM-points normalization bag, and the quota-membrane.
In the first post in this series we already discussed the ownership semantics for Merg-E. But at that point much of the language spec wasn't elaborated on, so the post wasn't exactly complete on the ownership semantics. In the post about attenuation we discussed membranes and we expanded on the ownership semantics. We touched on ownership semantics a few more times, but it was rather fragmented. In this post we want to both tie up the ownership semantics, and give a holistic view of just ownership semantics with all relevant modifiers and parts of the type system.
Scalars
A little recap of some scalar definition notations, just so we know what we are talking about.
string apihostname = ambient.settings.env "INNUENDO_HOSTNAME";
inert float128 π = 3.1415926535897932384626433832795028;
sensitive string apiKey = ambient.settings.vault.innuendofs "innuendo-api-key";
mutable float32 g = 9.80665;
shared mutable uint64 v = 246000000000;
borrowed mutable string notebook = "my little notebook." |||
"Ill let each of my friends write here";
Here:
apihostname : an implicitly non-sensitive implicitly non-mutable string scalar
π : an explicitly non-sensitive implicitly non-mutable 128 bit floating point scalar
apiKey : an explicitly sensitive implicitly non-mutable string scalar
g : an implicitly non-sharable explicitly mutable 32 bit floating point scalar
v : an explicitly sharable explicitly mutable 64 bit unsigned integer scalar
notebook : an explicitly borrowable explicitly mutable string scalar
Version 0.3 of the Merg-E language specification :
We will come back to each of these later on, so let's give them some human context:apihostname : Because the API host name is read from the config, it is constant and by default we assume that if it was sensitive, it would be in the vault. This constant will be implicitly captured.
π : pi, it doesn't get more constant than that, but the compiler doesn't know pi isn't sensitive, so we need to tell it by marking pi as inert. This constant will be implicitly captured.
apiKey : the API key is taken from the vault, it is a constant, but it is sensitive, so it needs to be explicitly captured or passed as a function argument.
g: Gravity here on earth. It is mutable but we can't make it shareable because if Alice starts on earth but she gives g to Bob who then goes into space, Bob's gravity will change but that is irrelevant to us, so no shared access, we keep it or we give it away.
v : the vacuum energy of the higgs field. If we give it to Bob and Bob on his warp-drive space flight causes vacuum decay, we are all in trouble, not just Bob.
notebook : Like a child's notebook that she lends to her friends, one at a time, and the holder can write in it until they give it back.
A callable recap.
When defining a callable, for example a function, there are three ways the function can gain access to data and functions that exist in the closure where the function lives.
- Implicit capture: This treatment is for all inert non-mutables and frozen dataframes.
- Explicit capture: This is where most of the ownership semantics take place.
- Function arguments: This is where things are closer to what most users expect from ownership semantics, but not quite.
Let's look at a function header line:
blocker allpending;
mutable function myFunction_01( string nb )::{
apiKey;
g;
v;
}{
inert float128 r = 1737400;
inert float128 c = 4 * π * r * r;
g = 1.62;
lock( v )::{
nb;
}{
nb = nb "Hey Alice, greeting from the moon." |||
"I think I just blew up the universe " |||
"while calculating its surface area.";
v = 42090807067;
};
};
allpending += myFunction_01(notebook);
await.all allpending;
Notice that π is implicitly captured because it is an inert constant? The apiKey scalar is also a constant, but because it is sensitive it isn't implicitly captured and we need to pull it in with the ::{} explicit capture. The gravity g is captured explicitly too, but remember that it wasn't sharable, so myFunction_01 now completely owns it and the outer scope is left with a hollow binding. When the outer scope tries to access it, it can't.
The following line would thus now result in a compile error:
inert float32 eg = g;
The vacuum energy of the higgs field is captured explicitly too, but because it is marked as sharable, the following line would give no direct problems:
inert uint64 ev = v;
Note that because myFunction_01 uses a lock, idiomatically, but let's not get distracted for now.
Finally the notebook. After awaiting allpending , we have full ownership over the notebook again.
This all shows things in action but let's make things very explicit so there are no misunderstandings.
Frozen scalar
So what happens when we freeze a scalar. Basically the scalar becomes non mutable, but delayed. But it can not always reliably be traced to be frozen at compile time as the compiler does not actually go into conditionals and loops, so if a scalar gets frozen in one branch of the conditional but not the other, or if a scalar gets frozen inside of a loop, and then a function or actor or lock gets defined afterwards, we say that the mutability is compile time ambiguous, and in such cases the compiler will refuse to compile any further and bail out without further promises from the user.
mutable uint8 age = 0;
mutable myScalar = 19744;
while age < 49 {
if age > 21 {
myScalar = myScalar - 5;
}{
myScalar = myScalar + 42;
freeze myScalar;
};
age = age + 1;
};
myFunction(myScalar);
As a human it is easy to see that myScalar will get frozen before myFunction gets called, but the compiler doesn't have that intelligence. It just sees a while and an if and doesn't reason about it, it just sees 3 paths, the path that the while never runs, the path that the while runs but the if never matches, and the path that the while runs and the if matches. It sees that myScalar might be frozen at the point that myFunction is called, and just gives up. That is, unless the user tells us to take the risk:
mutable uint8 age = 0;
mutable myScalar = 19744;
while age < 49 {
if age > 21 {
myScalar = myScalar - 5;
}{
myScalar = myScalar + 42;
freeze myScalar;
};
age = age + 1;
};
myFunction(hazardous frozen myScalar);
This specific use of hazardous is a different beast from the uses of the hazardous modifier that we have seen so far. The reason is that in this specific case, the behaviour of the code in case of breach of promise is undefined. That is, the compiler and/or runtime have some freedom as to how to implement the implication of this promise that myScalar will be frozen:
- It might get compiled defensively by passing myScalar by value, no exceptions, everything irie.
- It might lead to a runtime exception when myFunction is invoked before the invocation gets queued for the scheduler.
- It might do something else that the compiler or runtime thinks is a smart or effective way to implement things in the light of this promise, including ones where the error won't get raised until the invocation is scheduled or even run.
In short, if you don't keep this promise, there is no guarantee if you will ever see an error, and if you do, in what execution context.
Trust me bro
Now might be a good time to branch out on our subject a bit and to look at the hazardous modifier from a helicopter view perspective. We have seen multiple examples of the use of hazardous in Merg-E code, but this one above is by far the most hazardous uses of hazardous that the language has to offer. It is important to note that by default the semantic lexer will error out on the hazardous keyword. Merg-E is a least authority language after all, so everything should be as explicit as it can be. If you want the programmer to be able to say "Trust me bro' to the compiler by marking a construct as hazardous, we first want to be able to tell the compiler to trust the developer, to trust the code. But first we want the programmer to say the same thing about another programmer.
hazardous modifier<blocker> reentrant mutable merge utils.is_prime as is_prime(x int)::{
ok_count;
}{
max_prime;
}@[range_error];
This code tells the compiler: "I the programmer of the .mrg program file, trust the programmer of the utils.mrm module file to use the hazardous modifier responsibly on blockers within the is_prime function. Notice the weird modifier usage here. We need to encapsulate blocker in a modifier construct here because blocker is a type, not a modifier, and otherwise we would be breaking lexing rules while trying to communicate the hazardous for the merge refers only to blockers. Think of it as a modifier for a modifier.
Now however we know that one programmer trusts another programmer to use a hazardous blocker, nobody has said yet that we trust the programmer of the main program to use hazardous blockers, or that our trust in them is transitive. We do this with a compiler flag:
merge-e-c /home/me/my/app/dir/ --trustmebro=merge,blocker
This line tells the compiler: we trust the programmer of the main code to use hazardous blockers and this trust is transitive.
We can do the same if we want to trust the main programmer with hazardous frozen mutables like the one from the previous section:
merge-e-c /home/me/my/app/dir/ --trustmebro=merge,blocker,frozen
But is this what we meant? Maybe we want our blocker trust transitive, but not our frozen trust. We an express this, even if it's a bit verbose
merge-e-c /home/me/my/app/dir/ --trustmebro=merge,blocker --trustmebro=frozen
In many CICD setups, compiler flags are not the preferred way to work, and environment variables are preferred. But being an opinionated least authority language, we don't like this, at least not by default. Not implicit. But explicit is fine:
merge-e-c /home/me/my/app/dir/ --trustmebro=ENV
Now the compiler is made willing to look at the MERG_E_TRUST_ME_BRO environment variable, that can express the same, so now we can do something like:
export MERG_E_TRUST_ME_BRO=merge,blocker:frozen
Finally, if we just wanted to allow all possible use of hazardous we can either say so on the commandline:
merge-e-c /home/me/my/app/dir/ --trustmebro=ALL
or if we used the --trustmebro=ENV flag, with our environment variable
export MERG_E_TRUST_ME_BRO=ALL
Note that ALL is fully transitive, and if you want less you need to use the verbose variant.
ownership and capture semantics for scalars
| scalar-capacity | modifiers | from | capture | arguments | description |
|---|---|---|---|---|---|
| <= 8 bit | implicit | by reference | no ownership, constant | ||
| > 8 bit | settings | implicit | by reference | no ownership, constant | |
| > 8 bit | sensitive | settings | explicit | by reference | no ownership, constant |
| > 8 bit | literal | implicit | by reference | no ownership, constant, compiler warning!! | |
| > 8 bit | inert | literal | explicit | by reference | no ownership, constant |
| > 8 bit | hazardous | literal | implicit | by reference | no ownership, constant |
| > 8 bit | non-inert expression | explicit | by reference | no ownership, constant | |
| > 8 bit | sensitive | non-inert expression | explicit | by reference | no ownership, constant, idiomatic |
| > 8 bit | inert expression | implicit | by reference | no ownership, constant | |
| > 8 bit | inert | inert expression | implicit | by reference | no ownership, constant, idiomatic |
| mutable | explicit/move | by value | ownership transfer on explicit capture | ||
| shared mutable | explicit/shared | shared | shared ownership on both explicit capture and invocations | ||
| borrowed mutable | explicit/borrowed | borrowed | borrowed ownership, compiler warning on explicit capture |
A little note on the compiler warnings. A non-tiny literal is treated as inert, but if it might not be, the warning tells the user: please be explicit, mark it inert if it is indeed inert, or mark it sensitive or hazardous if it isn't. Compiler warnings on explicit capture of a borrowed mutable is because of the lifetime of the callable. Holding on to a borrowed mutable longer than necessary is considered non-idiomatic.
Errors
Both the default move of mutables and the use of borrows can lead to errors, but the type differs. When you try to access a scalar after a move, this results in a compile error. Compile errors are considered cleaner because you can't get around them, your code is invalid and you need to fix it. It is simply impossible to access a hollow scalar after a move. The errors on borrows are a bit more tricky also because they may differ between runtime implementations because of the no synchronicity guarantees of the language. When you access a scalar that is still held by the callable that you borrowed it to, this will result in a runtime error.You should prevent this by awaiting the end of execution of the callable before trying to access the borrowed mutable again.
Ownership and scheduling
We will get into the subject of the scheduler and its different implementations in different runtimes in a future post. but there are two specific scheduler topics we need to at least touch on in this post because they relate to ownership duration and the scheduling implications of sharing primitives. Let's start with a little bit about callable and callable scope lifetimes, also with respect to the no assumptions parallelism mantra of Merg-E.
While actors are by default very long lived. Without explicit management close to the longevity of the program's main operating system process, functions are conceptually ephemeral, at least their scope DAG is. However having thousands if not millions of ephemeral scope DAGs, even when using COM (Copy On Mutate) membranes, destroyed and rebuilt all of the time would be highly inefficient. The runtime will thus optimize things by re-using scopes, but more importantly, reusing the explicit captures containing shared and borrowed mutables. In short, as long as there are still workloads for a function in the scheduling queue (not an actual queue, but we will get back on that in a future post), in a handling queue or actively being handled, it is highly likely that the scope will get reused, and thus borrows wont get released. And even afterwards, if the parent scope is still uncompleted and might either still invoke a new function instance or await the already finished ones, the runtime might hold on to the scope and not destroy it yet, thinking it might need it later on. This is not deterministic, hence the advice not to hand over borrows in the explicit capture filter.
But there is a different scheduling result from explicitly captured borrows too. A borrowed mutable can always be held by only one logical ephemeral function at the same time, so the scheduler can't schedule ephemeral functions in parallel, not even if defined as reintrant, because being reintrant isn't a promise a function holding and actually using a borrowed mutable is able to keep. Basically capturing a borrowed mutable serializes all invocations, almost turning our ephemeral, potentially highly parallelizable function into almost an actor, but without the actual pros of being an actor.
A second subject regarding scheduling that is important has to do with locks. A lock both is and isn't the same execution context as its parent scope. A never awaited lock can have its own explicit capture filter and may even need it if it touches mutables outside of the lock spec. But an awaited lock changes the semantics and the scheduling. It is basically delayed until the await is initiated and the parent execution scope is suspended, and when it becomes active it will basically inherit (or implicitly capture if you like, but without actual moves or borrows) the captured and locally defined mutables of its direct parent scope. When the lock body is done, the suspended parent scope gains everything back as if the lock never captured anything.
This all sounds overly complicated, but it is what is needed to provide the user with the semantic experience of a real lock, while keeping the implementation of the lock a pure scheduling primitive.
The quirks of dataframes
So far for scalars. Now on to the first of two non-scalars in the light of capturing and ownership, and most importantly, freezing.
As we discussed before, a dataframe always starts of its life being mutable, but mutable in a very specific way. You can add rows to the dataframe, one row at a time, but the rows are immediately immutable. And then, when you want to actually use it, you first need to freeze the dataframe before you can, and there are ways of transfer that are prohibited to avoid compiler complexity.
One important thing that can't be emphasized enough: There is no capturing of stand-alone dataframes!
This may feel weird, especially in the light of how they can be captured under certain conditions as part of a DAG, but when you realize that Merg-E treats dataframes like it treats callables (that we will look at in the next section), and that DAGs are managed through membranes (also a subject we will look into later in this post), things might cognitively fall into place.
So for short, dataframes need to be frozen before you can use them, and when you can finally use them and want to do so with a callable (function or actor), you need to do so as a function argument, or more idiomatically in a vectorize expression.
callables, mutable callables and DAGs.
A callable that isn't marked as mutable has no authority other than possibly some sensitive data that we expect it to encapsulate. We thus consider a non-mutable callable to be powerless and will implicitly capture it like we do with inert immutables. When we get to mutable callables though, things start touching capability theory, and we end up wanting to use graphs and graph theory so we can reason about capabilities as a graph. As such, we don't want these capabilities to be captured either implicitly or explicitly, instead we use function arguments to transfer them as a dag, and if the callable wants to keep hold of the handed over capability DAG node, they can ent it into their own authority tree. This way of thinking about authority might be foreign at first, but once you start thinking of capabilities like a DAG in their own right, reasoning about authority actually becomes much simpler.
bobActor(daggify carolFunction, label);
Imagine that our outer scope is aliceActor, alice has a bobActor and a carolFunction and aliceActor wants to introduce carolFunction to bobActor, this function invocation represents the Granoveter diagram we show above. Now bobActor can take the daggified carolFunction and ent (graft) it into its private authority tree.
ent scope.export peerdag as myCarol;
This line of bobActor code grafts the peerdag function argument under the name myCarol in the scope.export branch of bobActor's scope. Thinking about authority in this way may take a little getting used to, but it is at the core of the Merg-E language.
But this isn't the end of it. Is access to carolFunction shared? Or did we just do a move? The answer to this is short and simple: shared.
We consider functions as encapsulation of whatever ownership pattern we needed to support, and the object capability model sufficient for managing further access.
By covering mutable functions we basically covered most of DAG ownership already. DAGs can be shared or delegated as function arguments. But while it is seen as non-idiomatic for functions and actors, for deeper DAGs explicit captures are actually the more idiomatic way to transfer ownership, and transfer is what you will get. An explicit capture of a DAG is a move.
dag entropy = prune scope.ambient.os.entropy;
mutable actor function myCryptoBox(string data, string operation, callable<string> emit)::{
entropy;
}{
..
};
No need to ent (graft) anything, ownership transferred and entered into the child
Coming up
In this post we revisited all of the little snippets of ownership modeling we discussed so far and we filled out the blanks. We discussed how scalars, dataframes and DAGs are shared, captured, and how the different ownership semantics all fit together into Merg-Es least authority DAD based language design. We also took a little deep dive into the trust and authority of the hazardous modifier, and how it fits with least authority from the lexer’s perspective. We discussed the use of the tongue in cheek named trustmebro compile flag, and how it can be used to make the Merg-E language more or less trusting to the programmer in doing less idiomatic things that sometimes might be needed.
I'll need at least one more post to talk about parallelism models, iterators, and possibly a few more.