Some Thoughts About Type Theory
Semantics
Primitives
Let's start with Null, Unit, and Bool. Null is a type with no values. Unit is a type with one value, which I'll call unit. Bool is a type with two values, which I'll call false and true.
Now let's add Sigma and Pi. Sigma is an operator that creates a dependent pair type, i.e. a pair type in which the type of the second element can depend on the value of the first element. Pi is an operator that creates a dependent function type, i.e. a function type in which the type of the output can depend on the value of the input.
Let's use (Sigma (: t T) (F t)) to denote dependent pair types and (Pi (: t T) (F t)) to denote dependent function types. In this context t is a value of type T, while F is a function that accepts a value of type T and returns a type. The colon denotes type annotation. Parenthesis denote function application.
Datum Types
Datum1 := (Sigma (: b Bool) Unit)
Datum2 := (Sigma (: b Bool) Datum1)
Datum3 := (Sigma (: b Bool) Datum2)
Datum4 := (Sigma (: b Bool) Datum3)
Datum5 := (Sigma (: b Bool) Datum4)
Datum6 := (Sigma (: b Bool) Datum5)
Datum7 := (Sigma (: b Bool) Datum6)
Datum8 := (Sigma (: b Bool) Datum7)
The type Datum8 is our encoding for a byte. Conceptually, it's a binary tree of depth eight in which inner nodes are dependent pair types and each leaf node is the Unit type.
A value of type Datum8 is serialized as eight consecutive bits, followed by a unit that occupies no space. This can be interpreted as a path through the aforementioned tree, followed by a value of the type that exists at the end of that path. Let's use 0x00 through 0xFF to denote the possible values, and thus the possible paths through a binary tree of depth eight.
Union Types
(Union1 T00 T01) := (Sigma (: b Bool) (if b T00 T01))
(Union2 T00 ... T03) := (Sigma (: b Bool) (if b (Union1 T00 T01) (Union1 T02 T03)))
(Union3 T00 ... T07) := (Sigma (: b Bool) (if b (Union2 T00 ... T03) (Union2 T04 ... T07)))
(Union4 T00 ... T0F) := (Sigma (: b Bool) (if b (Union3 T00 ... T07) (Union3 T08 ... T0F)))
(Union5 T00 ... T1F) := (Sigma (: b Bool) (if b (Union4 T00 ... T0F) (Union4 T10 ... T1F)))
(Union6 T00 ... T3F) := (Sigma (: b Bool) (if b (Union5 T00 ... T1F) (Union5 T20 ... T3F)))
(Union7 T00 ... T7F) := (Sigma (: b Bool) (if b (Union6 T00 ... T3F) (Union6 T40 ... T7F)))
(Union8 T00 ... TFF) := (Sigma (: b Bool) (if b (Union7 T00 ... T7F) (Union7 T80 ... TFF)))
The type Union8 is our encoding for a union type with an eight bit tag. Conceptually, it's a binary tree of depth eight in which inner nodes are dependent pair types and each leaf node is some type specified by a type parameter.
A value of type Union8(T00 ... TFF) is serialized as a pair of elements concatenated together. This can be interpreted as a path through the tree (which specifies a type) followed by a payload (which inhabits the specified type). To use a value like this, you should specify a derivation from each possible payload type to a single known result type.
Note that we could reasonably use notation where these type parameters are Null by default, such that Union8(Unit) is equivalent to Union8(Unit Null ... Null).
Tuple Types
(Tuple1 T00 T01) := (Pi (: b Bool) (if b T00 T01))
(Tuple2 T00 ... T03) := (Pi (: b Bool) (if b (Tuple1 T00 T01) (Tuple1 T02 T03)))
(Tuple3 T00 ... T07) := (Pi (: b Bool) (if b (Tuple2 T00 ... T03) (Tuple2 T04 ... T07)))
(Tuple4 T00 ... T0F) := (Pi (: b Bool) (if b (Tuple3 T00 ... T07) (Tuple3 T08 ... T0F)))
(Tuple5 T00 ... T1F) := (Pi (: b Bool) (if b (Tuple4 T00 ... T0F) (Tuple4 T10 ... T1F)))
(Tuple6 T00 ... T3F) := (Pi (: b Bool) (if b (Tuple5 T00 ... T1F) (Tuple5 T20 ... T3F)))
(Tuple7 T00 ... T7F) := (Pi (: b Bool) (if b (Tuple6 T00 ... T3F) (Tuple6 T40 ... T7F)))
(Tuple8 T00 ... TFF) := (Pi (: b Bool) (if b (Tuple7 T00 ... T7F) (Tuple7 T80 ... TFF)))
The type Tuple8 is our encoding for a tuple type with an eight bit index. Conceptually, it's a binary tree of depth eight in which inner nodes are dependent function types and each leaf node is some type specified by a type parameter.
A value of type Tuple8(T00 ... TFF) is serialized as a sequence of elements concatenated together. This can be interpreted as a table for mapping a path through the tree (which specifies a type) to a payload (which inhabits the specified type). To use a value like this, you should specify a derivation from a single known payload type to a single known result type.
Note that we could reasonably use notation where these type parameters are Unit by default, such that Tuple8(Bool) is equivalent to Tuple8(Bool Unit ... Unit).
Comments
I know very little about type theory. I'm trying to understand what a serialization format informed by type theory might look like, but that's kind of an atypical goal. Take everything above this point and everything below this point with a grain of salt.
So, what about arrays? It seems like there should be an operator that creates an array type, but that raises some questions. Is the size of the array part of the type, or part of the serialized value? If it's part of the type, do you need to find an encoding for type-level integers? If it's part of the serialized value, how do you encode it, and how large can it be? How do you even create an array value? And how do you use it?
I suppose an array is similar to a list, in the same way that a byte is similar to a natural number. Can you constrain the size of inductive types? Can you encode them efficiently? It's possible that everything else is misguided, and you'd be better off using inductive types for everything. Hard to say, given my current level of understanding.
Regardless, it's interesting that values of distinct types can have the same representation after they're serialized. And it's also interesting that two types with the same number of possible values won't necessarily have the same size.
You could probably continue to pull threads here, if you wanted to. I only used Sigma and Pi for trivial derivations, but they should be powerful enough to do a lot more. And I didn't focus too much on function types, which are actually pretty interesting.
You could probably argue that the logical gate NAND has type (Tuple2 Datum1 Datum1 Datum1 Datum1) and value 0b1110. I suppose that's enough for complex functions, but enumerating every possible result isn't very efficient when functions are complex. Presumably that's why we build computers with a relatively small number of primitive operations, and then use opcodes to reference those operations. More proof that every problem in computer science can be solved by adding a layer of indirection.
Concise Notation
(+ ...) := Union256(...)
(* ...) := Tuple256(...)
0 := (+)
1 := (+ Unit)
2 := (+ Unit Unit)
3 := (+ Unit Unit Unit)
...