Version 0.4 of the Merg-E language specification : Tensors and tens...

This is part twentyone in a series on the language spec for the Merg-E Domain Specific Language for the InnuenDo Web 3.0 stack. Where most previous posts were about v0.3, this is the second post that is about the v0.4 version of the spec. I'll add more parts to the below list as the spec progresses:

part 1 : coding style, files, merging, scoping, name resolution and synchronisation
part 2 : reverse markdown for documentation
part 3 : Actors and pools.
part 4 : Semantic locks, blockers, continuation points and hazardous blockers
part 5 : Semantic lexing, DAGs, prune / ent and alias.
part 6 : DAGs and DataFrames as only data structures, and inline lambdas for pure compute.
part 7 : Freezing
part 8 : Attenuation, decomposition, and membranes
part 9 : Sensitive data in immutables and future vault support.
part 10 : Scalars and High Fidelity JSON
part 11 : Operators, expressions and precedence.
part 12 : Robust integers and integer bitwidth generic programming
part 13 : The Merg-E ownership model, capture rules, and the --trustmebro compiler flag.
part 14 : Actorcitos and structural iterators
part 15 : Explicit actorcitos, non-inline structural iterators, runtimes, and abstract scheduler pipeline.
part 16 : async functions and resources and the full use of InnuenDo VaultFS
part 17: RAM-Points, RAM-points normalization bag, and the quota-membrane.
part 18: Literal operators & Rational and Complex numbers.
part 19: Interaction between operators, integer bitwidth generics, and the full numeric type-system.
part 20 (v0.4): Compile-time dimensional analysis, SI/Planck units and the scaling literal operator.
part 21 (v0.4): Tensors and tensor literals.

In this post, we are going to look deeper into matrices and tensors. In Merg-E, tensors and matrices are closely related but fundamentally distinct. Tensors exist purely at compile time as a rich typing mechanism, resolving to highly optimized runtime scalars, vectors, and n-dimensional matrices.Tensors can exist stand-alone, or embedded as part of a quantity (as discussed in our previous post). Stand-alone tensors can derive from any numeric type and are meant primarily for use with our five exact type families: whole, int, cint, rational, and crat, predominantly within a cryptography context. (Please note that the v0.4 language spec renames the uint type family to whole, turning the old uint tokens into aliases).In contrast, tensors embedded within a quantity are designed specifically for physics-related simulations and calculations. Consequently, they are restricted to deriving exclusively from our two inexact type families: float and complex.

We discussed quantity in depth in the previous post, so in this post we are going to take a deep dive into the tensor and just the tensor. But we'll use quantity in some of the code examples.

Why tensors and why the exact/inexact split?

Merg-E is a Web 3.0 dataflow and crypto DSL. Dataflow often uses physics fields and math. For that reason Merg-E needs the concept if a quantity with float or float based complex as underlying atomic numbers, and with dimensional analysis for physics alligned type safety as discussed in the previous post. Next to that, crypto often uses matrices and lattices tha fit well on tensors and the type safety Merg-E lets tensors providen, hence stand alone tensors that use exact types like int and rational as underlying atomic numeric type. The two are distinct but they overlap.

the general syntax.

Let's go back to our example from the previous post, the stress tensor:

mutable quantity::<tensor::<float256, 2, [3,3]>, unit::<sys: SI, mass: 1, length: -1, time: -2>> sigma;

We discussed the unit bit extensively, so let's zoom in to the tensor section and let's expand it a little bit to capture the full type annotation container:

<tensor::<float256, 2, [3,3], [symmetric::<[0,1]>]>

Note that the tensor annotation has four parts:

The atomic type the tensor builds on
The tensor rank, 0 for a scalar, 1 for a vector, or in this case 2, for a 2-dimensional matrix, etc.
The shape of the tensor, in this case a 3x3, so we have a 3 by 3 matrix.
A symmetries specification.

symmetry specifications

The symmetry specification needs a bit of extra attention. It is a collection of zero or more axes of (a)symmetry, in our rank-2 tensor there is only one possible collection of two indices offer what a symmetry or asymmetry can exist, the 0,1 collection, but a rank-3 or rank-4 can have multiple axes of (a)symmetry. In this case, the stress tensor, a symmetric entry, tells us the stress tensor needs to be symmetric over the 0,1 axis, and asymmetric tensors over that axis can't be assigned to it. Other tensors may define asymmetric instead. In that case symmetric tensors on that axis can't be assigned to it. If neither symmetric or asymmetric is defined, both are permitted and no compile errors will occur on that axis.

matrix literals

As we discussed a few times before, we reserved the latin-1 unicode space for literal operators. These are operators we need to write special literals in Merg-E. Matrix literals are a little bit more flexible than other literals. They allow us all kinds of sparse matrix definitions. A matrix literal is a comma separated list of matrix chunks, so let's start of with looking what a simple sparse matrix literal looks like:

[¿,¿] § [[0,1,2],[1,0,3],[2,3,0]]

We can write the same like this:

[0,¿] § [0,1,2],[1,¿] § [¿,0,3],[2,¿] § [¿,¿,0]

Or like this:

matrix::<¿,0> [0,¿] § [0,1,2],[1,¿] § [¿,0,3]

Or a bit more readable with the line concatenation pipes:

matrix::<¿,0> [0,¿]  § [0,1,2], |||
              [1,¿] § [¿,0,3]

So let's investigate what is going on.

First let's look at the section operator §. The section operator works on two section lists, the section scope list and the section definition list. Basically it reads like LH scope has RH definition. In the first example the whole matrix is one single scope. In the second and third example each row has its own sub scope.

The second literal operator is the inverted question mark operator ¿. This operator has a different meaning in the scope list than it has in the definition list and matrix modifier that we will look at next. In scope context the ¿ is a placeholder for every single value of a given index, so [¿,¿] denotes the scope of the entire rank-2 tensor while [0,¿] denotes the scope of just row 0- of that tensor.

This should explain the first form:

[¿,¿] § [[0,1,2],[1,0,3],[2,3,0]]

This is the compact form, we define the matrix literal as one big non-sparse compact chunk.

On the definition list side, as well as in a matrix modifier, the ¿ operator is the symmetry operator. It is a placeholder that should use symmetry to fill in the actual value. Note that the use of this operator here is only possible if the tensor has a single axis of symmetry. Let's look again at the second form:

[0,¿] § [0,1,2],[1,¿] § [¿,0,3],[2,¿] § [¿,¿,0]

Note there are three chunks, one for each row now. Note that the ¿ operators in the definition list keep us from having to duplicate values that can make use of symmetry.

Now for the third form:

matrix::<¿,0> [0,¿] § [0,1,2],[1,¿] § [¿,0,3]

In this for the literal is prefixed with a matrix modifier that modifies the entire expression. The matrix modifier has two attributes:

The default value for non axis-of-symmetry values
The default value for axis-of-symmetry values.

The use of this modifier in this case allows us to expand on the sparseness of the tensor. We don't need to define row 2 because the defaults give us enough info to infer it. The third row is not needed here because the whole [2,¿] § [¿,¿,0] chunk from the second example can be filled in from the two defaults.

Bringing it together

Let's bring this back to our original example line, and change from mutable to immutable:

inert quantity::<tensor::<float256, 2, [3,3], [symmetric::<[0,1]>]>, unit::<sys: SI, mass: 1, length: -1, time: -2>> sigma = matrix::<¿,0> [0,¿] § [0,1,2],[1,¿] § [¿,0,3];

It's all a bit verbose, still, but the type safety this all brings should be worth the verbosity.

Stand alone tensors.

inert tensor::<int8, 2, [6,6], [symmetric::<[0,1]>]> identity6 = matrix<0,1>;

Where quantities are float or complex based, stand-alone tensors are meant for precise types, like int in this example. In Merg-E these are mostly meant for cryptography. This example shows an int8 based tensor that form the 6x6 identity matrix.

Tensors vs matrices

The core distinction between tensors and matrices in Merg-E is that tensors are strictly compile-time constructs; they have no runtime footprint. The compiler uses tensors to enforce strict type-safety, perform dimensional analysis, validate symmetry, and parse rich sparse literals. Once these checks pass, the tensor completely dissolves, folding its data into standard runtime scalars, flat vectors, or multi-dimensional matrices.

conclusion

In this post we discussed an important v0.4 language spec part in this series, tensors and tensor literals. We looked into basic symmetries as typesystem invariants that increase type safety, and we discussed sparse vs dense literals.

I'll keep working on the v0.4 spec, but my current priority lies with the implementation of the testing runtime, so don't expect many posts in this series in the coming months.

Version 0.4 of the Merg-E language specification : Tensors and tensor literals