This is part twenty in a series on the language spec for the Merg-E Domain Specific Language for the InnuenDo Web 3.0 stack. Where previous posts were about v0.3, this is the first post that is about the v0.4 version of the spec. I'll add more parts to the below list as the spec progresses:
- part 1 : coding style, files, merging, scoping, name resolution and synchronisation
- part 2 : reverse markdown for documentation
- part 3 : Actors and pools.
- part 4 : Semantic locks, blockers, continuation points and hazardous blockers
- part 5 : Semantic lexing, DAGs, prune / ent and alias.
- part 6 : DAGs and DataFrames as only data structures, and inline lambdas for pure compute.
- part 7 : Freezing
- part 8 : Attenuation, decomposition, and membranes
- part 9 : Sensitive data in immutables and future vault support.
- part 10 : Scalars and High Fidelity JSON
- part 11 : Operators, expressions and precedence.
- part 12 : Robust integers and integer bitwidth generic programming
- part 13 : The Merg-E ownership model, capture rules, and the --trustmebro compiler flag.
- part 14 : Actorcitos and structural iterators
- part 15 : Explicit actorcitos, non-inline structural iterators, runtimes, and abstract scheduler pipeline.
- part 16 : async functions and resources and the full use of InnuenDo VaultFS
- part 17: RAM-Points, RAM-points normalization bag, and the quota-membrane.
- part 18: Literal operators & Rational and Complex numbers.
- part 19: Interaction between operators, integer bitwidth generics, and the full numeric type-system.
- part 20 (v0.4): Compile-time dimensional analysis, SI/Planck units and the scaling literal operator.
- part 21 (v0.4): Tensors and tensor literals.
The step from language-spec v0.3 to language-spec v0.4
WARNING: THIS POST IS A v0.4 LANGUAGE SPEC POST, IT IS "NOT" PART OF THE v0.3 LANGUAGE SPEC!!
This is the first post about v0.4 version of the Merg-E language, made while I'm still very much busy implementing a scripted testing only runtime for the v0.3 version. So before we start this post, it is important to outline what the v0.4 spec is supposed to be:
- An update and extension of the v0.3 language spec with anything in the language that testing with the v0.3 based testing runtime reveals to be wrong or inconvenient.
- An extension of the DAG/dataframe only approach to structure with numeric-only n-dimensional matrices meant for crypto, supporting all seven Merg-E numeric type families.
- A float/complex only extension of the Merg-E type-system with compile-time dimensional analysis as type-safety feature for physical quantities.
- Seamless integration of physical quantities and matrices math through the use of tensors.
That will be the full scope of the v0.4 spec. No big collection of new features, just fixes for stuff we will run into once (or sometimes before) we have the v0.3 testing runtime, and two symbiotic new big language features.
post scope
We won't go deep into the subject of tensors in this post, that is something for a later post, partially because it is a subject that need substantially more work, but we will lightly touch on tensors because they are part of the syntax for the subject I want to discuss in this post, physical quantities.
dipping our toe into tensors.
But let's get the subject of tensors out of the way first, the relevant part. You can often think of a tensor as N-dimentional matrix, we will discuss the difference between matrices and tensors in a future post, but the important two aspects of tensors that are relevant for now is that a rank-0 tensor is basically a scalar, and that in physics it is very common to use higher rank tensors instead. As such it makes sense to define scalar quantities as rank-0 tensors with what is called a dimensional units vector.
Unitless numbers
Let's dive in to the deep end and just show some syntax.
Let's start by looking at something that is just a number, no units, not anything physical:
inert float256 pi = 3.14159265358979323846264338327950288419716939937510582097494459230781640;
NOTE: Wether this is still valid syntax for v0.4 is currently under consideration.
We are flexing the large float a little bit here, so let's go back a few notches and use a less precise version:
inert float16 pi = 3.140625;
From v0.4 of the language spec (if it remains valid syntax), this is equivalent to :
inert quantity::<tensor::<float16,0>,unit::<>> pi = 3.140625;
Or, using convenience aliases:
inert quantity::<tscalar.float16, units.none> pi = 3.140625;
The quantity part in this case is a compile-time only safety annotation that tells the compiler exactly what type of quantity the number represents so it can make absolutely sure never to assign it to something that is expected to contain a completely different type of quantity.
In this case, because the tensor rank parameter is set to 0 (a scalar), this metadata is completely erased after a successful compilation, leaving a raw, zero-overhead primitive. If the rank is greater than zero, the type system instructs the compiler to condense the quantity into a concrete runtime N-dimensional matrix. But for scalars, it vanishes entirely.
Note that tscalar.float16 is a convenient alias for tensor::<float16,0> and units.none is a convenient alias for unit::<>. One further thing to note early is that idiomatically we always write unit::<> and never unit::<sys: SI> or unit::<sys: PLANCK>, more on this later.
Implicit dimensionless floats?
The sample:
inert float16 pi = 3.140625;
has no dimensions. In this case that is intentional, but for a language that aims to adhere to the safety by default mantra, the question is if it is safe to have dimensionless floats by default as an option. The alternative is quite a bit more verbose:
inert quantity::<tscalar.float16, units.none> pi = 3.140625;
Whether we want to allow the first variant in v0.4 is currently under consideration. If we don't, a less verbose nondimensionality shortcut syntax would seem like a good compromise. Maybe something like
inert dimensionless::<float16> pi = 3.140625;
This is not currently part of the v0.4 language spec, but it is under consideration.
Base units
The second part is the more important part for here, the unit of a quantity is there to distinguish a pure unitless quantity like pi from something like your height:
inert quantity::<tensor::<float16, 0>, units.meter> height = 1.825;
Please note that here units.meter is only an alias for the actual units used:
inert quantity::<tensor::<float16,0>,unit::<sys: SI, length: 1>> height = 1.825;
The sys parameter here grounds our units into the SI system of units.
Planck units
If you are doing actual physics, much of physics becomes many times more simple when instead of SI units, you scale all units in such a way that important constants of nature all become 1. The Planck system of units does just that. In Planck units, constants like c, the speed of light is simply 1, one planck unit of distance per planck unit of time. Planck units tend to be tiny with one notable exception, planck temperature, that one is absolutely massive. We can define quantities in planck units by setting sys to PLANCK:
inert quantity::<tensor::<float16,0>, unit::<sys: PLANCK, length: 1, time: -1>> speed = 0.07;
Here we set the speed of our spaceship to 7% of the speed of light. Note that in Merg-E planck units and SI units don't mix. No automatic conversions or anything smart. The type system just keeps you from accidentally assigning a speed in planck units to a speed in SI units, or from multiplying planck quantities by SI quantities. Many of these things are conceptually possible in systems that do conversions, but this is not that the quantity compiletime quantity annotations are for, they are for extreme type safety for physical quantities. It's a dumb but incredibly safe language feature.
Now it should become clear why it is idiomatic to use unit::<> and never unit::<sys: SI> or unit::<sys: PLANCK> for dimensionless quantities. The type system makes it OK to do math with systemless dimensionless quantities, and regular floats and complex floats translate to these, but dimensionless quantities with a system can only be used with dimensionfull entities of that same system.
Composite units
We can take this one step further. In many cases a quantity will be in some composite set of units, for example the gravity on our planet is an acceleration measured in meters per second squared
inert quantity::<tensor::<float16, 0>,unit::<sys: SI, length:1, time:-2>> g = 9.80665;
Well, because quantity gives us compile time annotations and compile time dimentional analysis, we are extending type safety into the physical world. You can not accidentaly assign a length to an acceleration or a dimentionless number like pi to a speed.
When you multiply a length by a length you get a surface area, when you divide a speed by an amount of time you get an acceleration, but all of this has meaning only at compile time, there is zero runtime cost to the quantity sub-system. If it compiles it is correct, and if you accidentally multiply acceleration by time instead of dividing it by time, you get a distance and trying to assign a distance to an acceleration will lead to a compile error.
Please note that quantities in Merg-E are a safety feature, not a compatibility feature. As you may have noted, a unit has a sys field. In Merg-E these systems are mutually exclusive, transformations between for SI units and planck units is NOT supported, planck units and SI units don't mix, Merg-E is a DSL not an all purpose language, so pick yous system of units and stick to it. Merg-E also doesn't support any non base-10 systems of units. So no stone, ounce or inch.
Scaling literals
quantities are expected to always be in the systems actual units, but notationally this can be a bit of a pain. To aliviate this pain a little, we define the literal operator thorn ( þ ) as literal scaling operator.
inert quantity::<tensor::<float256, 0>, units.meter> tinybit = 1.437 þ nano;
This basically resolves to:
inert quantity::<tensor::<float256, 0>, units.meter> tinybit = 0.000000001437;
Let's look into what þ actually does. It takes the floating point number on the left and scales it with a power of ten. The token nano in this case will resolve to the constant -9:
inert quantity::<tensor::<float256, 0>, units.meter> tinybit = 1.437 þ -9;
For SI units you will usualy want to use standard expressions:
| scale | power of 10 |
|---|---|
| quecto | -30 |
| ronto | -27 |
| yocto | -24 |
| zepto | -21 |
| atto | -18 |
| femto | -15 |
| pico | -12 |
| nano | -9 |
| micro | -6 |
| mili | -3 |
| kilo | 3 |
| mega | 6 |
| giga | 9 |
| tera | 12 |
| peta | 15 |
| exa | 18 |
| zetta | 21 |
| yotta | 24 |
| ronna | 27 |
| quetta | 30 |
But most planck units will fall out of the SI defined scaling factors for day to day quantities, so while we can use SI scalers, we will need to revert to numeric values if for example we want to describe visible sizes in planck units:
inert quantity::<tensor::<float256, 0>, unit::<sys: PLANCK, length: 1> km = 61871424.1 þ quetta;
This would be 1 km in planck units using the biggest scaler SI scaling allows us. While still readable, constructs like that lose their practicality when we for example express an hour in planck time units. So lets take our km example and see how we can express the km example more clearly with numerics:
inert quantity::<tensor::<float256, 0>, unit::<sys: PLANCK, length: 1> km = 61.8714241 þ 36;
A peek at higher rank tensors
We have been seeing the whole tensor::<float256, 0> stuff throughout this post, and while we aren't going to deep dive into higher rank tensors, it seems only fair to give a little peek under the hood for physical quantities that require a higher rank of tensor than rank-0 tensors that are basically just scalars with what looks like a lot of syntactic noice, so let's justify the noice.
Let's start by explaining that tensor::<float256, 0> is actually short for tensor::<float256, 0, []>, where the three parameters are:
- atomic numeric type in the tensor
- rank of the tensor
- shape of the tensor
Because a scalar as rank-0 has no shape, we can omit the third parameter there. Not so for higher rank tensors. An example:
mutable quantity::<tensor::<float256, 2, [3,3]>, unit::<sys: SI, mass: 1, length: -1, time: -2>> sigma;
We are not going to go into the physics of it, but the above defines a mutable quantity sigma that can hold a so-called stress tensor. Basically a 3x3 matrix of same unit values that together form one physical quantity of internal forces and pressures in a material. The important part is that here we define a rank-2 tensor with a 3x3 shape, that we will later explore assigning matrix values to and doing calculations with.
Or for our hour example:
inert quantity::<tensor::<float256, 0>, unit::<sys: PLANCK, time: 1> hour = 66.775981421438 þ 45;
Conclusion
In this post we discussed the first v0.4 language spec part in this series, compile time quantity type annotations. A very usefull additional safety feature for the Merg-E type system. We lightly touched on tensors, a subject I'll shall ve writing about more once the spec fully cristalizes. I hope the loose mention didn't bring any confusion, but if it did, future posts in this serie should clarify things. We introduced a new literal operator, the scaling operator þ , and showed it's use.
I'll keep working on the v0.4 spec, but my current priority lies with the implementation of the testing runtime, so don't expect many posts in this series in the comming months.