Version 0.3 of the Merg-E language specification : RAM-Points, RAM-...

This is part seventeen in a series on the 0.3 version of the language spec for the Merg-E Domain Specific Language for the InnuenDo Web 3.0 stack. I'll add more parts to the below list as the spec progresses:

part 1 : coding style, files, merging, scoping, name resolution and synchronisation
part 2 : reverse markdown for documentation
part 3 : Actors and pools.
part 4 : Semantic locks, blockers, continuation points and hazardous blockers
part 5 : Semantic lexing, DAGs, prune / ent and alias.
part 6 : DAGs and DataFrames as only data structures, and inline lambdas for pure compute.
part 7 : Freezing
part 8 : Attenuation, decomposition, and membranes
part 9 : Sensitive data in immutables and future vault support.
part 10 : Scalars and High Fidelity JSON
part 11 : Operators, expressions and precedence.
part 12 : Robust integers and integer bitwidth generic programming
part 13 : The Merg-E ownership model, capture rules, and the --trustmebro compiler flag.
part 14 : Actorcitos and structural iterators
part 15 : Explicit actorcitos, non-inline structural iterators, runtimes, and abstract scheduler pipeline.
part 16 : async functions and resources and the full use of InnuenDo VaultFS
part 17: RAM-Points, RAM-points normalization bag, and the quota-membrane.
part 18: Literal operators & Rational and Complex numbers.

In part 8 we looked at membranes a bit and discussed a number of membranes, and we discussed the quota-caretaker In this post we are going to look at a currently only psrtially specified combination of quota with a memory for one specific purpose: Treating memory as a least-authority resource. But before we can do that, we need to look into how the multi-runtime concept for Merg-E dictates that we introduce a normalized RAM-Points system.

The basket of goods normalization.

In economic settings, some kind of grocery basket of goods is often used as a weighted reference. For example to calculate a Consumer Price Index. Such a basket is meant to reflect things like modern spending trends and contains hundreds of items in an attempt to have a representative basket that can then be used to do further calculation with. For Merg-E such a basket contains multiple DAGs with a variety of node types and membranes, and a variety of integer, string and dataframe sizes that is considered to be representative of typical use. The exact content of this basket is yet to be determined as the language evolves and demo code codebases grow, and is expected to be rather fluid during early language development. In runtime-A the memory footprint of an int128 may be bigger, relatively to basket size than in runtime-B, while the relative footprint of that same int128 when added through a copy on mutate membrane might be bigger for runtime-B. What we do next is that we count the total number of useful non-duplicating entities in our basket. Non duplicating means for example that we don't count dataframe columns if we are already counting the rows, instead we count such things as column based storage to the dataframe itself, and we assign row storage needs not already covered by the scalars as being part of the row. So in the end we have a collection of entities: scalars, dataframes, rows, non-scalar DAG nodes, membranes, caretakers, scopes, etc and we divide the total memory usage of our whole basket by this number. The result is one RAM-point. Then we go back to the individual basket item types, and we normalize the RAM usage of each item type to RAM-points. The result of this should be that a 1000 RAM point DAG-view on Runtime-A should be roughly the same amount of RAM-points on Runtime-B, even if Runtime-A uses 4 times the amount of actual RAM.

It is important to note that the basket will remain fluid untill the 1.0 version of the language spec.

RAM-points and the type system.

RAM-points are a more pervasive concept than it might appear at first glance. Every scalar type must be able to communicate its cost at runtime, as well as every membrane need to be able to communicate its additional cost for these types. Because Merg-E supports thin abstractions, this information will be available to the Merg-E programmer:

inert float16 points1 = lang.memory.int.rampoints(lang.types.int2048)
inert float16 points2 = lang.memory.string.dynamic_rampoints(800)
inert float16 points3 = lang.memory.dag.dag_rampoints(mydag.some.spot, lang.type.int128)

Normally the user won't need to use these, so we leave elaboration to a later version of the language spec.

A quota membrane

Now that we have defined RAM-Points as deeply integrated into the type system, we can start talking about the quota membrane. It is important to note that this membrane does not actually enforce the quota at RAM allocation time. It enforces it either at ent (graft) time, or eventually at dataframe grow-time (ruled by parallelism primitives). You can define a quota membrane around every non-scalar non-dataframe DAG node.

mutable dag newdag = membrane<quota<80, 40, 30>> olddag;

What the above statement does is that it defines newdag as a quota-membrane over olddag wit a total quota of 80 RAM points. The second number claims 40 RAM-Points for direct allocations with the membrane itself, and the third number defines that layered membranes may be created with a total overcommit of 30 RAM points. This means the maximum quota pool available to hand out to directly stacked child membranes is calculated as: (Total Quota - Direct Allocation) + Overcommit.

So the following will be valid:

mutable dag dag3 = membrane<quota<60,30, 10>> newdag;
mutable dag dag4 = membrane<quota<40,40, 0>> dag3;

On prune and on ent (graft)

Before we look at the ent(graft)-time enforcement of quota we need to look at pruning first. Merg-E keeps a strict policy of same-pool grafting as to prevent any function from playing the system. A sub-DAG that is pruned using one quota membrane becomes free to graft on any directly stacked quota membrane on top of the first quota membrane in the stack. So in the sample above all three DAGs can ent each other's pruned DAGs for free, but if there are unrelated membranes or if there are other types of membranes in between, the full price will be paid on grafting.

So the following will use up RAM-points:

mutable dag version = prune scope.imported.lang.Version
ent dag4 version as Version;

But this won't:

mutable dag mypi = prune newdag.coolfloat256.pi
ent dag4 mypi as Pi;

Applying a membrane to the export to import of a closure

There is one special case where the quota membrane requires a tweak to the existing syntax of an important language construct, and that is when importing (merging) module code or defining a callable, or actually to be more precise, on scope construction (where scope construction is triggered by either the first invocation of the callable, or the first invocation of that callable after a prior cleanup by Níðhöggr. We need to remember that when a scope is constructed:

The scope.import of the child scope is populated with DAG nodes from scope.export in the parent scope, what are essentially ent (graft) operations on an empty sub DAG od scope..
Any definition of constants or variables within the child callable further populates scope.import, again in a way equivalent to ent.

So if we want to sandbox our child callables, we need a way to insert a quota membrane in a syntactically friendly way on top of the import branch of the scope DAG.

We do this by allowing optional parameterizability of both merge statements and function definition:

mutable merge utils.is_prime as is_prime(x int)<membrane<quota<15,0,0>>>::{
     ok_count: ok_count;
     }{
       max_prime;
       };

Note that the parameterisability takes a membrane as a single argument directly before the ::{}{} blocks. The same syntax is allowed for local and module function definitions.

It is important to note that for this semantics, both locally defined entities and moved captures count towards the quota but captured constants do not. Under the hood the scope.import and scope.export will fall under the same quota membrane, what will result in an exception to the moved captures rule if the parent scope is a quota membrane too.

On grow (dataframes)

In part 6 we looked at dataframe construction. When a dataframe is part of DAG and grows like this:

    mutable dataframe inDf string id, int32 startval, int32 endval;
    inDf( "foo"  17 99 );
    inDf( "bar"  42 1999 );
    inDf( "baz"  377 1483 );
    freeze inDf;

things are straight forward. Every inDf invocation runs through the membrane and will raise a runtime error after adding the row violates the quota. The dataframe is left unfrozen with the offending row unadded. The row might have been temporarily added depending on the runtime implementation, but the exception-time dataframe will be in a guaranteed non-offending state. Níðhöggr plays no role in this scenario.

Things get more complicated in this situation:

    mutable blocker dfReady;
    dfReady += vectorize myFunction inDf outDf;
    await_all dfReady;

Here myFunction may have been scheduled in many instances already and when Níðhöggr is notified that membrane invariants have been violated. Multiple results might already be further exacerbating the violation of the membrane invariants. Níðhöggr will abandon unscheduled tasks but allow running tasks to complete. The sorting actorcito of the vectorization workflow will be notified by Níðhöggr and will assure the membrane invariant is preserved.
Thus the only guaranteed invariant for this code that all runtimes must adhere to is that when await_all resolves, it will raise an exception and outDf will no longer violate the membrane invariants, even if allocation might have temporarily violated it. The reason for this is that synchronization needs would otherwise become a performance bottleneck that the language architect doesn't believe is a price worth paying for a short and likely minor violation of the invariant unless the number of columns in the dataframe is massive.

Coming up

Again this post was unplanned. The quota membrane wasn't originally planned as part of the 0.3 version of the spec, but the changes to the syntax of the merge, function and and main syntax impacted the semantic lexer enough to move quota membranes forward from version 0.4 or 0.5 of the language spec to version 0.3.

After this post, my priority goes back to completing Innuengo VaultFS and the semantic lexer for Merg-E, the parser, and the scheduler pipeline (Yggdrasyl and Níðhöggr) for the development runtime, so expect posts on my progress first before extensions to this series on the v0.3 language specs.

Version 0.3 of the Merg-E language specification : RAM-Points, RAM-points normalization bag, and the quota-membrane.