This is part fifteen in a series on the 0.3 version of the language spec for the Merg-E Domain Specific Language for the InnuenDo Web 3.0 stack. I'll add more parts to the below list as the spec progresses:
- part 1 : coding style, files, merging, scoping, name resolution and synchronisation
- part 2 : reverse markdown for documentation
- part 3 : Actors and pools.
- part 4 : Semantic locks, blockers, continuation points and hazardous blockers
- part 5 : Semantic lexing, DAGs, prune / ent and alias.
- part 6 : DAGs and DataFrames as only data structures, and inline lambdas for pure compute.
- part 7 : Freezing
- part 8 : Attenuation, decomposition, and membranes
- part 9 : Sensitive data in immutables and future vault support.
- part 10 : Scalars and High Fidelity JSON
- part 11 : Operators, expressions and precedence.
- part 12 : Robust integers and integer bitwidth generic programming
- part 13 : The Merg-E ownership model, capture rules, and the --trustmebro compiler flag.
- part 14 : Actorcitos and structural iterators
- part 15 : Explicit actorcitos, non-inline structural iterators, runtimes, and abstract schedular pipeline.
- part 16 : Async functions and resources and the full use of InnuenDo VaultFS
- part 17 : RAM-Points, RAM-points normalization bag, and the quota-membrane.
- part 18: Literal operators & Rational and Complex numbers.
- part 19: Interaction between operators, integer bitwidth generics, and the full numeric type-system.
In this post we look deeper at structural iterators and actorcitos using different concurrency patterns than those provided by inline foreach and inline foreachloop expressions. In addition to that, we are going to be having a preliminary look at parallelism models that future implementations of Merg-E runtimes could implement, and we are going to discuss the why of the the no assumptions about parallelism Merg-E mantra.
Explicit actorcitos
In our previous post we showed how we could define special fingerprint functions, referred to as actorcitos, that we could then combine with an iterateable data structure like a string, a dataframe or a DAG to combine into a structural iterator. Because we used these actorcitos in an inline fashion, the runtime would make everything run synchronously, and would provide the patterns of interaction that would combine results or the single final result, into a semantic return value. But if we want the actorcito to run asynchronically and optionally in parallel, we need to do part of the plumbing ourselves.
If we started with:
inert string mixedcase = "Mixed-case String that We are going to iterate over.";
iterator mc_iter = iteratify mixedcase uppercase;
inert string upper = inline foreach mc_iter;
The singular asynchronous version of this will look something like this:
inert string mixedcase = "Mixed-case String that We are going to iterate over.";
borrowed mutable string upper = "";
function myreturn( rval string )::{
upper;
}{
upper = rval;
};
actorcito ac_myfunc = actorcitobind uppercase myreturn;
iterator mc_iter = iteratify mixedcase ac_myfunc;
blocker alldone;
alldone += foreach mc_iter;
await.all alldone;
It's a lot more boilerplate, but we just decoupled the actorcito from the enclosing execution scope. Everything will run asynchronously but in sequence, processed by a single actorcito running on one of the primitives of execution (more on that later).
An actorcito pool
While the previous scenario was a bit verbose, it was still very simple without parallelizability concerns. But when we bring in actorcito pools, things become a bit more involved:
inert string mixedcase = "Mixed-case String that We are going to iterate over.";
borrowed mutable string upper = "";
function myreturn( rval character |||
iterate_seq uint64 |||
is_last boolean sort callable<character uint64 boolean> )::{
}{
sort(rval, iterate_seq, is_last)
};
actorcito ac_myfunc = actorcitobind pool<16> uppercase |||
myreturn |||
stringsorter upper;
iterator mc_iter = iteratify mixedcase ac_myfunc;
blocker alldone;
alldone += foreach mc_iter;
await.all alldone;
So what is happening here? This piece of code is arguably the least intuitive bit of Merg-E code we have seen so far in the whole series, but let's walk through it, and see if it's worth the trouble.
One striking thing we notice is that even while we are working with a pool, upper is still borrowed and not shared. Next we see a function that seems pretty useless. It gets a few arguments, the actorcito return character, an iteration sequence number, and a boolean indicating that this was the last invocation. One thing we must note: while everything can come in out of order, the is_last never is.
So why have this seemingly redundant function? Well, while in this case there is absolutely nothing to do for the function other than forward, we need to realize that this won't always be the case, and we are striding through the swamp of low abstraction, so the general case gets precedence here.
Now comes the most confusing line:
actorcito ac_myfunc = actorcitobind pool<16> uppercase |||
myreturn |||
stringsorter upper;
With 'pool<16> uppercase' we are hinting the runtime that we would appreciate a pool of actorcitos of at most 16 instances at the runtimes convenience, that we want this pool to be combined into a new actorcito, and that we want the opportunity for the results to get sorted by a stringsorter, and we borrow our borrowed mutable upper to this string sorter.
This stringsorter is the callable that our myreturn function gets as an argument.
The rest of the code is unchanged from before, just that now we use the actorcito pool instead of a single actorcito.
The big takeaway here is that because the results will come in to our return value capturing function in mostly non-deterministic unsorted order, we may want to re-order them into the same order the iterator supplied them in.
Other than a stringsorter, we also have rowsorter for dataframes, and a nullsorter for if we don't need any sorting. There is no DAG sorting because a DAG by default is an unsorted data structure.
Merg-E in the InnuenDo stack, and the vision of different runtimes.
Let's look at a stripped version of the InnuenDo stack, to be precise the part that is currently actively being developed, or soon will be. We have the port of the python CoinZdense proof of concept to the C++ libcoinzdense, the port of the HIVE only python aiohivebot Web 3.0 bot and backend framework to the multi-chain aiow3, the currently being designed Merg-E language, and then, hopefully not far in the future, the Python language bindings for libcoinzdense.
Looking at the image above, we see two arrows coming out of Merg-E. One pointing to the Python language bindingd, and the one pointing directly to libcoinzdense. These two arrows need a bit more elaboration.
The semantic lexer for Merg-E that I'm currently starting on is meant to be part of a first development and scripting implementation. A slow and barely parallel implementation that is just good enough to develop the language and get the language specs polished and the ambient DAG worked out. This runtime is meant to be an interpreter with built-in runtime written in Python and running on Python. For parallelism, we just have tasks in the python asyncio meaning of the word. No threads or thread pools, and certainly nothing fancier.
Then we get to the second arrow, The arrow going directly from Merg-E to libcoinzdense. This isn't a line from the development runtime to libcoinzdense. Instead we envision a second, real runtime that is linked with both your compiled Merg-E program and an IR of libcoinzdense into a native binary. This runtime will be multi threaded but still CPU-only using a temporarily downscalable thread pool to accommodate the bursty CPU needs of the libcoinzdense just in time layer key generation system.
These two are the two runtimes that I think together are the MVP for Merg-E, a python and async task based scripting runtime, and a compiled target native runtime that is purely CPU.
The language design though is made in such a way that it should be possible to make two more runtimes though, if I ever end up with enough time to write these.
The third envisioned runtime is basically the second runtime, but augmented with GPU usage. I envision that many parallel constructs can map to SYCL kernels, and if these can integrate into the runtime, this could create a significantly more scalable runtime.
A fourth envisioned runtime, however a runtime I don't expect to be able to actually find the time for unless I give up on much of the InnuenDo stack not shown in the above diagram, including TypeScript language bindings and a TypeScript port of aiow3, is a runtime on top of BEAM, the VM that actor based languages like Erlang and Elixir run on, and it would be really valuable to have a Merg-E runtime running on top of BEAM.
Much of the current language design stems from the fact that I want these four runtimes to be possible and not horribly harder to implement for one runtime over the other. The paradigma used aim to be relatively friendly for each of the envisioned runtimes, even the one that is very unlikely to materialize from my own hands.
So summarising, we envision four runtimes:
| runtime | target | concurrency | use | likelihood completion |
|---|---|---|---|---|
| develop | scripted on top of python | single threaded task pools | language development, non critical scripting, personal bots | very high |
| basic | compiled/linked with program as native code | thread pool, CPU only | Web 3.0 bots, frontends, non-trading bots | high |
| enhanced | compiled/linked with program as native code plus SYCL kernels | thread pool + GPU | trading bots | low/medium |
| beam | the BEAM virtual machine | BEAM processes | Highly robust systems, new Web 3.0 L1s | very-low/low |
Each runtime will come with its own parallelism primitives, hence the no assumptions mantra for parallelism that I keep pressing.
Abstract scheduling pipeline
While the actual scheduling will be very different between runtimes, at an abstract level a common pattern emerges at least for the first two implementations. Imagine the event loop is either a Python async task or an actual thread. There is one event loop drawn in the diagram above, but every short queue below would have its own event loop handling it. Basically the short queues contain scheduler specific workloads that are basically delayed invocations or continuations thereof. A workload, just like almost anything in Merg-E is a DAG. A DAG that references both the scope of the runnable, a collection of conditions, a collection of dependencies that in a way are conditions too, a reference to the ancestor runnable, and, very importantly continuation point information, and of course : code, in whatever meaning of that word depending on the runtime .
An event loop picks up a runnable, figures out where it needs to start, and runs it either to completion or until an await is reached somewhere. If completed, the prunable part of the DAG is pruned and the stomp with information for the scheduler is put into a queue database. Basically a data structure that acts like a queue from the perspective of the event loop, but as an in-memory database from the perspective of the scheduler.
The real heavy lifting happens in the queue database, which is basically an in-memory data structure that acts like a queue towards the eventloops, but like a database towards the scheduler.
The queue database basically consists of four zones. The new zone, the unfulfilled zone and the ready zone and the done zone.
Whenever either a runnable completes or an external awaitable resolves, an entry is added in the done zone and the scheduler processes this entry, promoting entries from the unfulfilled zone to the ready zone. Then the done entry is removed.
When an entry is added to the new zone, and this includes resubmitted runnables with a continuation point, the scheduler processes this entry and either places it in the ready or unfulfilled zone.
After processing the done and new zone, if there is room in one or more eventloop queues (there are relatively short queues, big enough not to starve the eventloops, small enough to make the scheduler reactive enough), the scheduler queries the ready zone of the queue database for the highest priority runnables for each of the eventloop queues.
These queries are composite queries taking multiple factors into account like whether or not we are running in panic mode, if there were actors, actorcitos, or inline cascades scheduled to a specific event loop, how long an entry was in the queue, how many “”lock”” conditions are fulfilled by choosing a runnable, and more.
At the same time of returning these runnables, new locks are processed in the queue db, and some ready zone entries might get placed back into the unfulfilled zone.
It is important to note that this design is best fitted for actual threads running the event loop, and the develop runtime would probably be better off with a single eventloop, unless the bot being written is largely IO dominant, but because we are creating a testing ground for the language, including for the queue database and the scheduler, we choose to implement it with asyncio single threaded Python tasks.
Coming up
In this post we completed the list of big things in the language needed to complete a v0.3 language specification needed to start working on the development runtime. We concluded with a conclusion of the actorcito and structural iterator paradigm, and we looked into the four envisioned possible runtimes. It is likely and possible that there are loose ends that I'll run into, and the ambient tree is very much unspecified at the moment, for now it's going to grow as I extend the runtime beyond the core language to do useful things and interact with the world. So expect more posts on both.
After this post, my priority is completing the semantic lexer, the parser and the scheduler pipeline for the development runtime, so expect posts on my progress first before extensions to this series on the v0.3 language specs.