This is part nineteen in a series on the 0.3 version of the language spec for the Merg-E Domain Specific Language for the InnuenDo Web 3.0 stack. I'll add more parts to the below list as the spec progresses:
- part 1 : coding style, files, merging, scoping, name resolution and synchronisation
- part 2 : reverse markdown for documentation
- part 3 : Actors and pools.
- part 4 : Semantic locks, blockers, continuation points and hazardous blockers
- part 5 : Semantic lexing, DAGs, prune / ent and alias.
- part 6 : DAGs and DataFrames as only data structures, and inline lambdas for pure compute.
- part 7 : Freezing
- part 8 : Attenuation, decomposition, and membranes
- part 9 : Sensitive data in immutables and future vault support.
- part 10 : Scalars and High Fidelity JSON
- part 11 : Operators, expressions and precedence.
- part 12 : Robust integers and integer bitwidth generic programming
- part 13 : The Merg-E ownership model, capture rules, and the --trustmebro compiler flag.
- part 14 : Actorcitos and structural iterators
- part 15 : Explicit actorcitos, non-inline structural iterators, runtimes, and abstract scheduler pipeline.
- part 16 : async functions and resources and the full use of InnuenDo VaultFS
- part 17: RAM-Points, RAM-points normalization bag, and the quota-membrane.
- part 18: Literal operators & Rational and Complex numbers.
- part 19: Interaction between operators, integer bitwidth generics, and the full numeric type-system.
In this post I want to elaborate a bit on the interaction between numeric types and operators.
While much of what follows covers (incomplete) implementation details of built-in runtime operators. Details not fully part of the language spec, because they are mostly implementation details, I felt it necessary to outline these interactions following a discussion on the FRIAM Google Group.
Let's do a quick recap of the numeric types.
Runtime numeric types
We have a collection of "runtime numeric" type families that come in different sizes. Many sizes are too big to map directly to hardware numeric types.
- Whole numbers: uint1, uint2, uint4, uint8, uint16, uint32, uint64, uint128, uint256, uint512, uint1024, uint2048, uint4096. uint8192 and uint16384
- Integers: int2, int4, int8, int16, int32, int64, int128, int256, int512, int1024, int2048, int4096. int8192 and int16384
- Integer complex numbers: cint16, cint32, cint64, cint128, cint256, cint512, cint1024, cint2048, cint4096. cint8192, cint16384 and cint32768
- Rational numbers: rational16, rational32, rational64, rational128, rational256, rational512, rational1024, rational2048, rational4096. rational8192, rational16384 and rational32768
- Rational complex numbers: crat32, crat64, crat128, crat256, crat512, crat1024, crat2048, crat4096. crat8192, crat16384, crat32768 and crat65536
- Irrational numbers: float16, float32, float64, float128 and float256
- Irrational complex numbers: complex32, complex64, complex128, complex256 and complex512
Virtual numeric types
On top of these runtime types that only exist at power of two bitwidth sizes, we have a wide range of compile time virtual numeric types. For the v0.3 version of the Merg-E spec, these types have single expression resolve time, meaning that they need to resolve to runtime types within a single expression line.
- Whole numbers: Vuint1 .. Vuint32768 in steps of 1
- Integers: Vint2..Vint32768 in steps of 1
- Integer complex numbers: Vcint16 .. Vcint65536 in steps of 2
- Rational numbers: Vrational16 .. Vrational65536 in steps of 2
- Rational complex numbers: Vcrat32 .. vcrat131072 in steps of 4
Generic numeric types
And finally, just for the implementation of (custom and language native) operators, we have generic numeric types that can represent any of the above:
- Whole numbers: guint
- Integers: gint
- Integer complex numbers: gcint
- Rational numbers: grational
- Rational complex numbers: gcrat
- Irrational numbers: gfloat
- Irrational complex numbers: gcomplex
Safety & preserved fidelity by default
Merg-E tries to be safe by default, and that safety by default includes preserving both overflow/underflow safety and guaranteed preservation of fidelity by operators by default. To make this possible, multiple implementations of an operator are often needed to cover all possible combinations of numeric types, and to always return the appropriate safe virtual or runtime type for the result that is guaranteed not to overflow or underflow, and will preserve the fidelity of the least fidelity operator capture.
The implementation has integer bitwidth generics math at its disposal to implement these properties. But let's look at one built-in operator and see what it should do with two numeric values.
The below table shows the operator, the type of the left capture, and the type of the right capture. It can do one of three things:
- Use the default of promoting the result type to a width that is guaranteed to preserve the invariants without underflows, overflows or reduced fidelity.
- Swap the left capture with the right capture and try again
- Demote because a limited fidelity irrational makes promotion futile.
If promotion is the available behaviour, the user can use the nopromote modifier in the operator expression, exposing the expression to runtime errors in case of overflow and underflow. If next to the nopromote a hazardous modifier is used, the overflow or underflow is considered proper behaviour by the user, and no runtime errors are thrown.
Here is the table for the '+' operator, that is for the + operator to implement:
| operator | LC type | RC Type | behaviour | (virtual) result type | nopromote |
|---|---|---|---|---|---|
| + | uint<N> | uint<M> | promote | Vuint<max(N,M)+1> | uint<max(N,M> |
| + | uint<N> | int<M> | promote | Vint<max(N,M-1)+2> | Vint<M> |
| + | uint<N> | cint<2*M> | promote | Vcint<2*(max(N,M-1)+2)> | Vcint<2*M> |
| + | uint<N> | rational<2*M> | promote | Vrational<2*(max(N,M-1)+2)> | Vrational<2*M> |
| + | uint<N> | crat<4*M> | promote | Vcrat<4*(max(N,M-1)+2)> | Vcrat<4*M> |
| + | uint<N> | float<M> | demote | float<M> | |
| + | uint<N> | complex<2*M> | demote | complex<2*M> | |
| + | int<N> | uint<M> | swap/again | ||
| + | int<N> | int<M> | promote | Vint<max(N-1,M-1)+2> | Vint<max(N,M)> |
| + | int<N> | cint<2*M> | promote | Vcint<2*(max(N-1,M-1)+2)> | Vcint<2*M> |
| + | int<N> | rational<2*M> | promote | Vrational<2*(max(N-1,M-1)+2)> | Vrational<2*M> |
| + | int<N> | crat<4*M> | promote | Vcrat<4*(max(N-1,M-1)+2)> | Vcrat<4*M> |
| + | int<N> | float<M> | demote | float<M> | |
| + | int<N> | complex<2*M> | demote | complex<2*M> | |
| + | cint<2*N> | uint<M> | swap/again | ||
| + | cint<2*N> | int<M> | swap/again | ||
| + | cint<2*N> | cint<2*M> | promote | Vcint<2*(max(N-1,M-1)+2)> | Vcint<2*max(N,M)> |
| + | cint<2*N> | rational<2*M> | promote | Vcrat<4*(max(N-1,M-1)+2)> | Vcrat<4*max(N,M)> |
| + | cint<2*N> | crat<4*M> | promote | Vcrat<4*(max(N-1,M-1)+2)> | Vcrat<4*max(N,M)> |
| + | cint<2*N> | float<M> | demote | complex<2*M> | |
| + | cint<2*N> | complex<2*M> | demote | complex<2*M> | |
| + | rational<2*N> | uint<M> | swap/again | ||
| + | rational<2*N> | int<M> | swap/again | ||
| + | rational<2*N> | cint<2*M> | swap/again | ||
| + | rational<2*N> | rational<2*M> | promote | Vrational<2*max(N,M)+2> | Vrational<2*max(N,M)> |
| + | rational<2*N> | crat<4*M> | promote | Vcrat<4*(max(N,M)+1)> | Vcrat<4*max(N,M)> |
| + | rational<2*N> | float | demote | float<M> | |
| + | rational<2*N> | complex<2*M> | demote | complex<2*M> | |
| + | crat<4*N> | uint<M> | swap/again | ||
| + | crat<4*N> | int<M> | swap/again | ||
| + | crat<4*N> | cint<2*M> | swap/again | ||
| + | crat<4*N> | rational<2*M> | swap/again | ||
| + | crat<4*N> | crat<4*M> | promote | Vcrat<4*(max(N,M)+1)> | Vcrat<4*max(N,M)> |
| + | crat<4*N> | float<M> | demote | complex<2*M> | |
| + | crat<4*N> | complex<2*M> | demote | complex<2*M> | |
| + | float<N> | uint<M> | swap/again | ||
| + | float<N> | int<M> | swap/again | ||
| + | float<N> | cint<2*M> | swap/again | ||
| + | float<N> | rational<2*M> | swap/again | ||
| + | float<N> | crat<4*M> | swap/again | ||
| + | float<N> | float<M> | demote | float<min(N,M)> | |
| + | float<N> | complex<2*M> | demote | complex<2*min(N,M)> | |
| + | complex<2*N> | uint<M> | swap/again | ||
| + | complex<2*N> | int<M> | swap/again | ||
| + | complex<2*N> | cint<2*M> | swap/again | ||
| + | complex<2*N> | rational<2*M> | swap/again | ||
| + | complex<2*N> | crat<4*M> | swap/again | ||
| + | complex<2*N> | float<M> | swap/again | ||
| + | complex<2*N> | complex<2*M> | demote | complex<2*min(N,M)> |
How to read the table
There are basically three types of line in the table:
- promotion lines : These normally result in wider result types
- swap/again lines : These are there for DRY reasons id T1 + T2 is the same as T2 + T1
- demotion lines : These deal with limited fidelity (float based) types.
Promotion lines
Here is an example of a promotion line. We are adding up an N bits wise unsigned integer and an integer based complex number that is constructed out of two M wide signed integers.
| operator | LC type | RC Type | behaviour | (virtual) result type | nopromote |
|---|---|---|---|---|---|
| + | uint<N> | cint<2*M> | promote | Vcint<2*(max(N,M-1)+2)> | Vcint<2*M> |
The expression Vcint<2*(max(N,M-1)+2)> should be read as:
- This function returns a virtual Integer complex number with a bit width of 2*(max(N,M-1)+2) bits.
The max(N,M-1) part tells the implementation to start off comparing the two core sizes without the sign bit, and take the bigest of the two. The the +2 adds a bit back for the sign, plus one bit to prevent overflows. Finally, because the result is complex, we multiply by two and get the proper Vcint bitwidth.
Imagine we pick uint16 and cint32, both N and M will be 16. When we fill in these numbers we get a Vcint36. A Vcint36 won't fit into a cint32, it needs a cint64, what is quite a bit bigger. If we know what we are doing we can add the nopromote modifier after the (sub) expression:
inert uint16 a = 1000;
inert cint32 b = 783 ¡ 1933 nopromote;
inert cint32 c = a + b;
While this won't have fitted withour nopromote, the use of the modifier prompts the operator to look at the nopromote entry in the table: Vcint<2*M>
Basically we take the virtual variant of the right capture, the complex number. Nothing smart, simply the type of the side that has the ability but not necessarily the capacity to store the result.
Swap again lines
| operator | LC type | RC Type | behaviour | (virtual) result type | nopromote |
|---|---|---|---|---|---|
| + | cint<2*N> | uint<M> | swap/again |
We showed how to add a cint to an uint, but adding an uint to a cint is exactly the same. This isn't true for all operators, it is for + and * but not for - and / .As we don't want to repeat ourselves and we already have an implementation for uintN + cintM, the above line tells the system: don't mind there is no match for a + b, make it b + a and try again.
demotion lines
| operator | LC type | RC Type | behaviour | (virtual) result type | nopromote |
|---|---|---|---|---|---|
| + | uint<N> | float<M> | demote | float<M> |
A third type of line in the table are the demotion lines. If one of the two types involved has lower fidelity than the other one, any more fidelity than the worst one is considered false fidelity. As such in these operations Merg-E aims to demote the result to the type with the most suitable fidelity.
In the above line we see that if we add any unsigned integer to any floating point number, the result type is the type of the floating point number. It is important to note that where full fidelity types can be assigned to wider versions of their type but not narrower ones, for floating point types the inverse is true as we dont want to introduce false fidelity.
So this is default:
inert float32 a = 3.1415727;
inert uint8 b = 1;
inert float32 c = a + b;
This is valid, even if we erase fidelity:
inert float32 a = 3.1415727;
inert uint8 b = 1;
inert float16 c = a + b;
But if we want to increase fidelity we need to use the hazardous modifier with our operator:
inert float32 a = 3.1415727;
inert uint8 b = 1;
inert float64 c = a + b hazardous;
A full example
inert crat64 b = 19 + 88 + 0 ¡ 41 + 48 ÷ 256;
Merg-E evaluates same prededence operators left to right, and numeric literals start out as virtual typed, so this evaluates as:
Let's zoom in:
- "19" has no minus, so the literal is unsigned. We need 5 bits to represent 19, so it's a Vuint5
- "88" has no minus, so the literal is unsigned. We need 7 bits to represent 88, so it's a Vuint7
- "0 ¡ 41" consists of integer values, so its a cint (integer complex number) consisting of two signed integers. 41, the biggest of the two numbers needs 7 bits to represent, times two means it's a Vcint14.
- "48 ÷ 256" is a rational literal. Compile time available rationals are always reduced, so "48 ÷ 256" reduces to "3 ÷ 16". We need an VintN to represent the "3" and a VuintN to represent the "16", with the "16" needing the most bits, 5 bits, this will be a Vrational10.
So we can rewrite it as:
inert crat64 b = (((Vuint5("19") + Vuint7("88")) + Vcint14("0 ¡ 41")) + Vrational10("48 ÷ 256"));
We do the first '+'. In our table we see that for two uint types, the result should be Vuint<max(N,M)+1>. In our case Vuint<max(5,7)+1> or Vuint8.
inert crat64 b = (((Vuint8("19 + 88") + Vcint14("0 ¡ 41")) + Vrational10("48 ÷ 256"));
For the next step we have a Vuint8 and a Vcint14 (Vcint<27>). Our table gives us Vcint<2(max(N,M-1)+2)>, in our case Vcint<2*(max(8,7-1)+2)>, what is Vcint20.
inert crat64 b = (Vcint20("19 + 88 + "0 ¡ 41") + Vrational10("48 ÷ 256"));
Now for the last one. We have a Vcint20 (Vnint<210>) and a Vrational10 (VRational<25>). The table gives us Vcrat<4(max(N-1,M-1)+2)>, in our case Vcrat<4(max(10-1,5-1)+2)> or Vcrat44.
A virtual type Vcrat44 will get promoted to the smallest runtime crat type that is at least as wide, so in this case a crat64, what makes our expression valid. If we had tried to assign it to a crat32 we would have gotten a compile error.
Now let's demonstrate the consequences of virtual types being line bound.
inert uint8 a = 19
inert crat32 b = a + 88 + 0 ¡ 41 +
Now in the first line, "19" again is a Vuint5, that now is turned into an uint and assigned to a. This changes the expression in the second line to:
inert crat64 b = (((Vuint7("a") + Vuint7("88")) + Vcint14("0 ¡ 41")) + Vrational10("48 ÷ 256"));
The Vuint<max(N,M)+1> now becommes Vuint<max(8,7)+1> or Vuint9.
inert crat64 b = (((Vuint9("a + 88") + Vcint14("0 ¡ 41")) + Vrational10("48 ÷ 256"));
In the next step, Vcint<2(max(N,M-1)+2)> becomes Vcint<2(max(9,7-1)+2)> or a Vcint22
inert crat64 b = (Vcint22("a + 88 + "0 ¡ 41") + Vrational10("48 ÷ 256"));
So now in our last step with Vcint22 (Vcint<211>) and Vrational10 (Vrational<25>, our Vcrat<4(max(N-1,M-1)+2)> becomes Vcrat<4(max(11-1,5-1)+2)> or Vcrat48. While this has no consequences for our code, we still added four virtual bits to our virtual type. In some cases these bits could carry an expression to a type twice as big as necessary. For now this is something we accept for Merg-E because implementing compile time shadow types (like an uint8 that semantically is a Vuint5) would add yet another language feature to a list that is already quite big for a one-man spare-time pet project.
A note on rational reduction
It is important to note that in the v0.3 version of the language spec reduction of rationals only happens for rational literals. Merg-E does not maintain a canonical version of rationals across calculations. It is expected that future versions of the language spec will implement a canonical modifier for operators that will be valid if the operator result would be a VrationalN or a VcratN.
A less likely but possible second addition could be a modifier like canreduce that would act as a last resort reduction that only triggers when an overflow would otherwise have occured. This is a feature that absolutely belongs in Merg-E, but it won't be on the top of my priorities list.
Conclusions
The use of type promotion with the possibility to disable promotion and the possibility to disable runtime errors on over/under-flow errors, makes the current Merg-E operator system augment the type system in a way that prioritizes safety and fidelity over the convenience of small numeric types. It does so in a limited way. Virtual sizes aren't sticky and don't survive between different lines of the program, but they somewhat help prevent a bitwidth explosion. The integer bitwidth generics helps to allow operator implementation to remain manageable, although as seen in the table, there are still a lot of cases to implement for just a single operator. We have 49 lines in the table for just one operator. If we ignore the 21 'swap' entries, there are still 28 left, 13 demotions and 15 promotions with optional fallback to non-promotion. Even while this creates a lot to implement, I feel strongly this is a solid base for a safety and security first language aimed at the Web 3.0 space.