This is part nineteen in a series on the 0.3 version of the language spec for the Merg-E Domain Specific Language for the InnuenDo Web 3.0 stack. I'll add more parts to the below list as the spec progresses:
- part 1 : coding style, files, merging, scoping, name resolution and synchronisation
- part 2 : reverse markdown for documentation
- part 3 : Actors and pools.
- part 4 : Semantic locks, blockers, continuation points and hazardous blockers
- part 5 : Semantic lexing, DAGs, prune / ent and alias.
- part 6 : DAGs and DataFrames as only data structures, and inline lambdas for pure compute.
- part 7 : Freezing
- part 8 : Attenuation, decomposition, and membranes
- part 9 : Sensitive data in immutables and future vault support.
- part 10 : Scalars and High Fidelity JSON
- part 11 : Operators, expressions and precedence.
- part 12 : Robust integers and integer bitwidth generic programming
- part 13 : The Merg-E ownership model, capture rules, and the --trustmebro compiler flag.
- part 14 : Actorcitos and structural iterators
- part 15 : Explicit actorcitos, non-inline structural iterators, runtimes, and abstract scheduler pipeline.
- part 16 : async functions and resources and the full use of InnuenDo VaultFS
- part 17: RAM-Points, RAM-points normalization bag, and the quota-membrane.
- part 18: Literal operators & Rational and Complex numbers.
- part 19: Interaction between operators, integer bitwidth generics, and the full numeric type-system.
- part 20 (v0.4): Compile-time dimensional analysis, SI/Planck units and the scaling literal operator.
- part 21 (v0.4): Tensors and tensor literals.
In this post I want to elaborate a bit on the interaction between numeric types and operators.
While much of what follows covers (incomplete) implementation details of built-in runtime operators. Details not fully part of the language spec, because they are mostly implementation details, I felt it necessary to outline these interactions following a discussion on the FRIAM Google Group.
Let's do a quick recap of the numeric types.
Runtime numeric types
We have a collection of "runtime numeric" type families that come in different sizes. Many sizes are too big to map directly to hardware numeric types.
- Whole numbers: uint1, uint2, uint4, uint8, uint16, uint32, uint64, uint128, uint256, uint512, uint1024, uint2048, uint4096. uint8192 and uint16384
- Integers: int2, int4, int8, int16, int32, int64, int128, int256, int512, int1024, int2048, int4096. int8192 and int16384
- Integer complex numbers: cint16, cint32, cint64, cint128, cint256, cint512, cint1024, cint2048, cint4096. cint8192, cint16384 and cint32768
- Rational numbers: rational16, rational32, rational64, rational128, rational256, rational512, rational1024, rational2048, rational4096. rational8192, rational16384 and rational32768
- Rational complex numbers: crat32, crat64, crat128, crat256, crat512, crat1024, crat2048, crat4096. crat8192, crat16384, crat32768 and crat65536
- Irrational numbers: float16, float32, float64, float128 and float256
- Irrational complex numbers: complex32, complex64, complex128, complex256 and complex512
Virtual numeric types
On top of these runtime types that only exist at power of two bitwidth sizes, we have a wide range of compile time virtual numeric types. For the v0.3 version of the Merg-E spec, these types have single expression resolve time, meaning that they need to resolve to runtime types within a single expression line.
- Whole numbers: Vuint1 .. Vuint32768 in steps of 1
- Integers: Vint2..Vint32768 in steps of 1
- Integer complex numbers: Vcint16 .. Vcint65536 in steps of 2
- Rational numbers: Vrational16 .. Vrational65536 in steps of 2
- Rational complex numbers: Vcrat32 .. vcrat131072 in steps of 4
Generic numeric types
And finally, just for the implementation of (custom and language native) operators, we have generic numeric types that can represent any of the above:
- Whole numbers: guint
- Integers: gint
- Integer complex numbers: gcint
- Rational numbers: grational
- Rational complex numbers: gcrat
- Irrational numbers: gfloat
- Irrational complex numbers: gcomplex
Safety & preserved fidelity by default
Merg-E tries to be safe by default, and that safety by default includes preserving both overflow/underflow safety and guaranteed preservation of fidelity by operators by default. To make this possible, multiple implementations of an operator are often needed to cover all possible combinations of numeric types, and to always return the appropriate safe virtual or runtime type for the result that is guaranteed not to overflow or underflow, and will preserve the fidelity of the least fidelity operator capture.
The implementation has integer bitwidth generics math at its disposal to implement these properties. But let's look at one built-in operator and see what it should do with two numeric values.
The below table shows the operator, the type of the left capture, and the type of the right capture. It can do one of three things:
- Use the default of promoting the result type to a width that is guaranteed to preserve the invariants without underflows, overflows or reduced fidelity.
- Swap the left capture with the right capture and try again
- Demote because a limited fidelity irrational makes promotion futile.
If promotion is the available behaviour, the user can use the nopromote modifier in the operator expression, exposing the expression to runtime errors in case of overflow and underflow. If next to the nopromote a hazardous modifier is used, the overflow or underflow is considered proper behaviour by the user, and no runtime errors are thrown.
Here is the table for the '+' operator, that is for the + operator to implement:
unsigned int additions
| LC type | RC Type | behaviour | (virtual) result type | nopromote | description |
|---|---|---|---|---|---|
| uint<N> | uint<M> | promote | Vuint<max(N,M)+1> | uint<max(N,M)> | The largest of the two plus one overflow bit |
| uint<N> | int<M> | promote | Vint<max(N,M-1)+2> | Vint<max(N,M+1)+1> | The largest number of non sign-bits plus an overflow and a sign bit |
| uint<N> | cint<2*M> | promote | Vcint<2*(max(N,M-1)+2)> | Vcint<2*M> | Twice the largest number of non sign-bits plus an overflow and sign bit for both. |
| uint<N> | rational<2*M> | promote | Vrational<2*(N+M+1)> | Vrational<2*(N+M)> | With rationals addition requires multiplication, so N+M plus an overflow bit for the actual addition. |
| uint<N> | crat<4*M> | promote | Vcrat<4*(N+M+1)> | Vcrat<4*(N+M)> | Like rational, but twice because it's a complex number made up of two rationals |
| uint<N> | float<M> | demote | float<M> | There is a float with limited fidelity, we demote to the lowest fidelity | |
| uint<N> | complex<2*M> | demote | complex<2*M> | As for float, but twice because it is a complex number made up of two floats |
signed int additions
| LC type | RC Type | behaviour | (virtual) result type | nopromote | description |
|---|---|---|---|---|---|
| int<N> | uint<M> | swap/again | Maps to b + a | ||
| int<N> | int<M> | promote | Vint<max(N-1,M-1)+2> | Vint<max(N,M)> | The largest number of non-sign bits plus an overflow bit plus a sign bit |
| int<N> | cint<2*M> | promote | Vcint<2*(max(N-1,M-1)+2)> | Vcint<2*M> | Twice the size as for int, because it's a complex number made up of two integers |
| int<N> | rational<2*M> | promote | Vrational<2*(N+M)> | Vrational<2*(N+M-1)> | With rationals addition requires multiplication, so N+M plus an overflow bit for the actual addition minus one for the double sign bit |
| int<N> | crat<4*M> | promote | Vcrat<4*(N-1+M)> | Vcrat<4*(N+M-1)> | As for rational, but twice because we have a complex number made up of rationals |
| int<N> | float<M> | demote | float<M> | There is a float with limited fidelity, we demote to the lowest fidelity | |
| int<N> | complex<2*M> | demote | complex<2*M> | As for float, but twice because it is a complex number made up of two floats |
Integer based complex number additions
| LC type | RC Type | behaviour | (virtual) result type | nopromote | description |
|---|---|---|---|---|---|
| cint<2*N> | uint<M> | swap/again | Maps to b + a | ||
| cint<2*N> | int<M> | swap/again | Maps to b + a | ||
| cint<2*N> | cint<2*M> | promote | Vcint<2*(max(N-1,M-1)+2)> | Vcint<2*max(N,M)> | The largest number of non-sign bits plus an overflow bit plus a sign bit. Twice because it is a complex number of ints. |
| cint<2*N> | rational<2*M> | promote | Vcrat<4*(N+M)> | Vcrat<4*(N+M-1)> | A complex int plus a rational becomes a complex rational that takes up two int/uint rational pairs. Because rational addition involves multiplication, that is four times N plus M minus one for the double sign bit, plus one for overflow. |
| cint<2*N> | crat<4*M> | promote | Vcrat<4*(N+M)> | Vcrat<4*(N+M-1)> | As for complex int plus a rational |
| cint<2*N> | float<M> | demote | complex<2*M> | There is a float with limited fidelity, we demote to the lowest fidelity | |
| cint<2*N> | complex<2*M> | demote | complex<2*M> | As for float, but twice because it is a complex number made up of two floats |
rational additions
| LC type | RC Type | behaviour | (virtual) result type | nopromote | description |
|---|---|---|---|---|---|
| rational<2*N> | uint<M> | swap/again | Maps to b + a | ||
| rational<2*N> | int<M> | swap/again | Maps to b + a | ||
| rational<2*N> | cint<2*M> | swap/again | Maps to b + a | ||
| rational<2*N> | rational<2*M> | promote | Vrational<2*(N+M)> | Vrational<2*(N+M-1)> | With rationals addition requires multiplication, so twice N-1+M-1 plus an overflow bit plus a sign bit |
| rational<2*N> | crat<4*M> | promote | Vcrat<4*(N+M)> | Vcrat<4*(N+M-1)> | As for rational, but twice because we have a complex number made up of rationals |
| rational<2*N> | float | demote | float<M> | There is a float with limited fidelity, we demote to the lowest fidelity | |
| rational<2*N> | complex<2*M> | demote | complex<2*M> | As for float, but twice because it is a complex number made up of two floats |
rational based complex additions
| LC type | RC Type | behaviour | (virtual) result type | nopromote | description |
|---|---|---|---|---|---|
| crat<4*N> | uint<M> | swap/again | Maps to b + a | ||
| crat<4*N> | int<M> | swap/again | Maps to b + a | ||
| crat<4*N> | cint<2*M> | swap/again | Maps to b + a | ||
| crat<4*N> | rational<2*M> | swap/again | Maps to b + a | ||
| crat<4*N> | crat<4*M> | promote | Vcrat<4*(N+M)> | Vcrat<4*(N+M-1)> | With rationals addition requires multiplication, so twice N-1+M-1 plus an overflow bit plus a sign bit |
| crat<4*N> | float<M> | demote | complex<2*M> | There is a float with limited fidelity, we demote to the lowest fidelity | |
| crat<4*N> | complex<2*M> | demote | complex<2*M> | As for float |
Floating point additions
| LC type | RC Type | behaviour | (virtual) result type | nopromote | description |
|---|---|---|---|---|---|
| float<N> | uint<M> | swap/again | Maps to b + a | ||
| float<N> | int<M> | swap/again | Maps to b + a | ||
| float<N> | cint<2*M> | swap/again | Maps to b + a | ||
| float<N> | rational<2*M> | swap/again | Maps to b + a | ||
| float<N> | crat<4*M> | swap/again | Maps to b + a | ||
| float<N> | float<M> | demote | float<min(N,M)> | There is a float with limited fidelity, we demote to the lowest fidelity | |
| float<N> | complex<2*M> | demote | complex<2*min(N,M)> | As for float, but twice because it is a complex number made up of two floats |
Floating point based complex number additions
| LC type | RC Type | behaviour | (virtual) result type | nopromote | description |
|---|---|---|---|---|---|
| complex<2*N> | uint<M> | swap/again | Maps to b + a | ||
| complex<2*N> | int<M> | swap/again | Maps to b + a | ||
| complex<2*N> | cint<2*M> | swap/again | Maps to b + a | ||
| complex<2*N> | rational<2*M> | swap/again | Maps to b + a | ||
| complex<2*N> | crat<4*M> | swap/again | Maps to b + a | ||
| complex<2*N> | float<M> | swap/again | Maps to b + a | ||
| complex<2*N> | complex<2*M> | demote | complex<2*min(N,M)> |
How to read the tables
There are basically three types of line in the table:
- promotion lines : These normally result in wider result types
- swap/again lines : These are there for DRY reasons id T1 + T2 is the same as T2 + T1
- demotion lines : These deal with limited fidelity (float based) types.
Promotion lines
Here is an example of a promotion line. We are adding up an N bits wise unsigned integer and an integer based complex number that is constructed out of two M wide signed integers.
| LC type | RC Type | behaviour | (virtual) result type | nopromote | |
|---|---|---|---|---|---|
| uint<N> | cint<2*M> | promote | Vcint<2*(max(N,M-1)+2)> | Vcint<2*M> | Twice the largest number of non sign-bits plus an overflow and sign bit for both. |
The expression Vcint<2*(max(N,M-1)+2)> should be read as:
- This function returns a virtual Integer complex number with a bit width of 2*(max(N,M-1)+2) bits.
The max(N,M-1) part tells the implementation to start off comparing the two core sizes without the sign bit, and take the bigest of the two. The +2 adds a bit back for the sign, plus one bit to prevent overflows. Finally, because the result is complex, we multiply by two and get the proper Vcint bitwidth.
Imagine we pick uint16 and cint32, both N and M will be 16. When we fill in these numbers we get a Vcint36. A Vcint36 won't fit into a cint32, it needs a cint64, which is quite a bit bigger. If we know what we are doing we can add the nopromote modifier after the (sub) expression:
inert uint16 a = 1000;
inert cint32 b = 783 ¡ 1933 nopromote;
inert cint32 c = a + b;
While this won't have fitted without nopromote, the use of the modifier prompts the operator to look at the nopromote entry in the table: Vcint<2*M>
Basically we take the virtual variant of the right capture, the complex number. Nothing smart, simply the type of the side that has the ability but not necessarily the capacity to store the result.
Swap again lines
| LC type | RC Type | behaviour | (virtual) result type | nopromote | |
|---|---|---|---|---|---|
| cint<2*N> | uint<M> | swap/again | Maps to b + a |
We showed how to add a cint to an uint, but adding an uint to a cint is exactly the same. This isn't true for all operators, it is for + and * but not for - and / .As we don't want to repeat ourselves and we already have an implementation for uintN + cintM, the above line tells the system: don't mind there is no match for a + b, make it b + a and try again.
demotion lines
| LC type | RC Type | behaviour | (virtual) result type | nopromote | |
|---|---|---|---|---|---|
| uint<N> | float<M> | demote | float<M> | There is a float with limited fidelity, we demote to the lowest fidelity |
A third type of line in the table are the demotion lines. If one of the two types involved has lower fidelity than the other one, any more fidelity than the worst one is considered false fidelity. As such in these operations Merg-E aims to demote the result to the type with the most suitable fidelity.
In the above line we see that if we add any unsigned integer to any floating point number, the result type is the type of the floating point number. It is important to note that where full fidelity types can be assigned to wider versions of their type but not narrower ones, for floating point types the inverse is true as we don't want to introduce false fidelity.
So this is default:
inert float32 a = 3.1415727;
inert uint8 b = 1;
inert float32 c = a + b;
This is valid, even if we erase fidelity:
inert float32 a = 3.1415727;
inert uint8 b = 1;
inert float16 c = a + b;
But if we want to increase fidelity we need to use the hazardous modifier with our operator:
inert float32 a = 3.1415727;
inert uint8 b = 1;
inert float64 c = a + b hazardous;
A full example
inert crat64 b = 19 + 88 + 0 ¡ 41 + 48 ÷ 256;
Merg-E evaluates same precedence operators left to right, and numeric literals start out as virtual typed, so this evaluates as:
Let's zoom in:
- "19" has no minus, so the literal is unsigned. We need 5 bits to represent 19, so it's a Vuint5
- "88" has no minus, so the literal is unsigned. We need 7 bits to represent 88, so it's a Vuint7
- "0 ¡ 41" consists of integer values, so it's a cint (integer complex number) consisting of two signed integers. 41, the biggest of the two numbers needs 7 bits to represent, times two means it's a Vcint14.
- "48 ÷ 256" is a rational literal. Compile time available rationals are always reduced, so "48 ÷ 256" reduces to "3 ÷ 16". We need a VintN to represent the "3" and a VuintN to represent the "16", with the "16" needing the most bits, 5 bits, this will be a Vrational10.
So we can rewrite it as:
inert crat64 b = (((Vuint5("19") + Vuint7("88")) + Vcint14("0 ¡ 41")) + Vrational10("48 ÷ 256"));
We do the first '+'. In our table we see that for two uint types, the result should be Vuint<max(N,M)+1>. In our case Vuint<max(5,7)+1> or Vuint8.
inert crat64 b = (((Vuint8("19 + 88") + Vcint14("0 ¡ 41")) + Vrational10("48 ÷ 256"));
For the next step we have a Vuint8 and a Vcint14 (Vcint<27>). Our table gives us Vcint<2(max(N,M-1)+2)>, in our case Vcint<2*(max(8,7-1)+2)>, what is Vcint20.
inert crat64 b = (Vcint20("19 + 88 + "0 ¡ 41") + Vrational10("48 ÷ 256"));
Now for the last one. We have a Vcint20 (Vcint<210>) and a Vrational10 (VRational<25>). The table gives us Vcrat<4(N+M)>, in our case Vcrat<4(10+5)> or Vcrat60.
A virtual type Vcrat60 will get promoted to the smallest runtime crat type that is at least as wide, so in this case a crat64, what makes our expression valid. If we had tried to assign it to a crat32 we would have gotten a compile error.
Now let's demonstrate the consequences of virtual types being line bound.
inert uint8 a = 19
inert crat64 b = a + 88 + 0 ¡ 41 + 48 ÷ 256;
Now in the first line, "19" again is a Vuint5, that now is turned into an uint and assigned to a. This changes the expression in the second line to:
inert crat64 b = (((Vuint7("a") + Vuint7("88")) + Vcint14("0 ¡ 41")) + Vrational10("48 ÷ 256"));
The Vuint<max(N,M)+1> now becomes Vuint<max(8,7)+1> or Vuint9.
inert crat64 b = (((Vuint9("a + 88") + Vcint14("0 ¡ 41")) + Vrational10("48 ÷ 256"));
In the next step, Vcint<2(max(N,M-1)+2)> becomes Vcint<2(max(9,7-1)+2)> or a Vcint22
inert crat64 b = (Vcint22("a + 88 + "0 ¡ 41") + Vrational10("48 ÷ 256"));
So now in our last step with Vcint22 (Vcint<211>) and Vrational10 (Vrational<25>, our Vcrat<4(N+M)> becomes Vcrat<4(11+5)> or Vcrat64, so it still fits in a crat64, even if the virtual size grew from 60 to 64 bits. We should expect many edge cases where four bits like these will push required types into a type twice as big.
For now this is something we accept for Merg-E because implementing compile time shadow types (like an uint8 that semantically is a Vuint5) would add yet another language feature to a list that is already quite big for a one-man spare-time pet project.
On size limits
In Merg-E the maximum scalar type size is linked to the memory management design for the compiled-runtime version of the language. Right now it is undecided if the base memory management page will be 4 kbyte or 16 kbyte, but the max integer size is set to 16 kbit (2kbyte). This means it is currently uncertain if a 8 kbyte crat type is going to exist or not.
If we run out of bits for exact type families, and would need axact types outside of what can be expressed with upto 2 kbyte whole numbers or integers, 4 kbyte complex integers, 4kbyte rationals or 4 kbyte (or possibly 8 kbyte) complex rationals, the return type will degrade to the highest fidelity inexact type.
| exact type | degrades to |
|---|---|
| uint | float256 |
| int | float256 |
| cint | complex512 |
| rational | float256 |
| crat | complex512 |
Again, note that for now crat65536 is not defined, and it's future existence depends on implementation details for the future memory management system. It is likely that in order to suport Apple computers, the Merg-E memory management system will need to use 16 kbyte memory pages, allowing for 8 kbyte crat65536 complex rationals, but as these pages define the minimum memory footprint granularity of scheduable scopes, I'm still exploring efficient ways to support Apple machines without making 16k the memory management granularity. If I find a 4 kbyte solution, crat65536 won't get defined.
A note on rational reduction
It is important to note that in the v0.3 version of the language spec reduction of rationals only happens for rational literals. Merg-E does not maintain a canonical version of rationals across calculations. It is expected that future versions of the language spec will implement a canonical modifier for operators that will be valid if the operator result would be a VrationalN or a VcratN.
A less likely but possible second addition could be a modifier like canreduce that would act as a last resort reduction that only triggers when an overflow would otherwise have occured. This is a feature that absolutely belongs in Merg-E, but it won't be on the top of my priorities list.
Conclusions
The use of type promotion with the possibility to disable promotion and the possibility to disable runtime errors on over/under-flow errors, makes the current Merg-E operator system augment the type system in a way that prioritizes safety and fidelity over the convenience of small numeric types. It does so in a limited way. Virtual sizes aren't sticky and don't survive between different lines of the program, but they somewhat help prevent a bitwidth explosion. The integer bitwidth generics helps to allow operator implementation to remain manageable, although as seen in the table, there are still a lot of cases to implement for just a single operator. We have 49 lines in the table for just one operator. If we ignore the 21 'swap' entries, there are still 28 left, 13 demotions and 15 promotions with optional fallback to non-promotion. Even while this creates a lot to implement, I feel strongly this is a solid base for a safety and security first language aimed at the Web 3.0 space.