# Real-world type theory I: untyped normalisation by evaluation for λ-calculus

This is the first in a series of posts written in which I work through some topics in “real-world” implementation of type-theoretic concepts and the more “engineering”-like aspects of writing a theorem prover, inspired by András Kovács’s MSE literature survey on the topic. The eventual goal is to attain enough proficiency in both type theory and proof assistant engineering to be able to implement my own Idris-like language, but that’s probably far off. Still, it’s worth getting started earlier than not.

Please contact me about any errors or with any comments, either on Twitter, or at `chow dot soham at gmail`

.

All the code in this post is implemented from, and most of the ideas and information come from, chapters 2 and 3 of Andreas Abel’s habilitation thesis, and to some extent can be said to be the output I generated while working through them trying to internalise “why everything really works”. It’s quite readable, and I recommend reading it if you enjoy this article.

# Introduction

Consider some simple language, say the untyped *λ*-calculus, here extended with lets and booleans:

\[ e ::= x \mathbin{|} \lambda x. e \mathbin{|} e\,e \mathbin{|} \mathsf{let}\,x = e\,\mathsf{in}\,e \pipe \true \pipe \false \]

Syntactic expressions in this language look like `x`

, `k (λx. λy. y)`

, `λz. let id = λy. y in id (z true)`

, and so on. Let us denote the set of all such expressions by \(\mathsf{Exp}\).

One of the most basic things we can do with an expression in a language like this is evaluate it. For our purposes, **evaluation** is the process of executing or running a program \(t\) in a language, transforming it into a *value*. To paraphrase Abel, the value that evaluation assigns to a program can reasonably be considered to be a way to assign a notion of “meaning” for the program. To that end, we might denote this value \(\meaning(t)\), and ask ourselves: what is a value? Put another way, where should our values live? It makes sense to clearly separate the language we are implementing – the *object* language – from the language we are implementing it *in* – the *meta*-language. To a compiler engineer, LLVM IR or x86 assembly are good things to which a program in a fancy language can be reduced. To a type theorist or a logician, it may be preferable to use set theory, exotic things like domains, or a mathematical presentation of a sufficiently powerful type theory (such as Martin-Löf type theory) as the metalanguage. For our purposes we will imagine ourselves as intrepid implementors of theorem provers, unconcerned with both theory and compilation for the time being, and content ourselves with Haskell types and values. For example, it would be satisfactory for us to be given the term `(λz. let id = λy. y in id true) false`

and have it evaluate to some Haskell value `ExprTrue`

, and subsequently pretty-print the string `true`

to standard output. As long as we can make sense of the meaning, it’s fine, isn’t it?

Evaluation is closely related to, but not the same as, **normalisation**: the process of transforming an expression \(e \in\mathsf{Exp}\) into an expression \(e’\) contained in a subset \(\mathsf{Nf} \subseteq \mathsf{Exp}\). Expressions in \(\mathsf{Nf}\) are called *normal forms*, and \(e’\) is called the *normal form* of \(e\). It is important to note that normalisation is an operation where we start and end up at the same level, that of syntax, and can therefore be taken to be a purely syntactic operation.

## Motivating the importance of normalisation

Why do we care about normalisation? Indeed, why are we interested in doing normalisation via evaluation as the title of this article tells us we’re going to look at doing?

Normalisation corresponds to a sensible notion of *computation* in most type systems; most theories with a natural number type, for example, will have terms like \(1 + 5\), \((1 + 1) * (1 + 1 + 1)\), and so on all reduce to \(6\). In Abel’s wonderful analogy, normalisation increases *entropy*, understood by analogy to thermodynamic entropy. In physics, entropy is a measure of *disorder* in a system. A pile of firewood contains a lot of complex organic molecules; lighting it on fire reduces it to much simpler molecules, reducing the total amount of “order” in the system, and converts some of the energy stored in chemical bonds to heat energy in the process. Energy is conserved, but the entropy of the (wood + oxygen + smoke + ashes) system increases in the process. In our PL-theoretic world, too, normalisation collapses — not chemical, but rather syntactic — structure.

Normalisation can intuitively be expected to satisfy certain properties which suggest its suitability as a notion of “computing the value of a complex expression”:

in a typed language, normalising a term should not change its type: \[ \frac{\Gamma\vdash t : T}{\Gamma\vdash \nf(t) : T}\,\mathrm{\scriptsize{NF{-}WELL{-}TYPED}} \]

normalisation should be

*idempotent*: \(\nf(t) = \nf(\nf(t)) \), andnormalisation should preserve the meaning of a term: \( \meaning(t) = \meaning(\nf(t)). \)

So if you care about computing things inside some type theory, such as implementations of theorem provers or expressively-typed programming languages are required to do, normalisation is an important thing to think about.

Indeed, if you care about type-level computation, which is important to implementations of dependent type systems, the need for efficient normalisation becomes clear. Most dependent type theories have a rule in their typing judgments that looks something like

\[ \frac{\Gamma\vdash t : B\quad\quad\Gamma: A = B}{\Gamma\vdash t : A}\,\mathrm{\scriptsize{CONV}} \]

This says that if we want to show that a term \(t\) has type \(A\), it suffices to show it has type \(B\) for some other type \(B\), if we can also prove that \(A\) and \(B\) are *convertible*, which means “can be reduced to the same term”. In most type systems, this term is the normal form, so we can check for convertibility by normalising. For example, in a type theory with an “*η*-law for functions”, the terms \(t\) and \(\lambda x.t \, x\) both reduce to whatever \(t\) reduces to, and are hence convertible.

With all this said, it should be clear that for many type theories, typechecking involves applying that theory’s version of a \(\mathrm{\scriptsize{CONV}}\) rule many times, which usually reduces to a lot of normalisation, and hence to a lot of computations of normal forms. As our type systems increase in expressiveness, these computations become more and more complex; as David Christiansen’s NbE tutorial discusses, in general, types in a dependently-typed language can contain entire programs that we have to run to perform type-checking!

Computing normal forms efficiently is important: and that’s exactly what “normalisation by evaluation” lets us do.

This post is literate Haskell, so we enable a few language extensions we’ll be using.

## Aside: notions of equality

Convertibility also often coincides with (and is used as a synonym for) *judgmental equality*, which refers to a notion of equality that arises from a judgment that explicitly tells us how we can tell when two types are convertible. (In the typing rule above this judgment is written \(A = B\); you’ll also see \(\equiv\) used in a lot of places.) Another common term for the same thing is *definitional equality*, which literally means an equality that holds “by definition”. These forms of equality can be contrasted with *propositional equality*, which refers to a form of equality that is internal to the type theory, and about which we can prove things from within the theory itself. For example, we can define the following type to capture a form of propositional equality in Haskell (with the `GADTs`

extension enabled), and very similar things can be written in Agda or Idris:

With this, you can actually write the function

```
magic :: a -> Eq a b -> b
magic x e = case e of
Refl ->
-- in the body of this pattern-match,
-- GHC knows that `a` and `b` are the same
x
```

or even prove simple theorems like “propositional equality is transitive”, assuming you have accepted the Curry-Howard correspondence into your heart:

# The problem with normalisation by reduction

Let’s try to figure out our own algorithm, in a broad sense, for performing normalisation in a simple type theory (like our untyped *λ*-calculus from above). The obvious choice is to simply “interpret” or “compute” the value of the expression. This can be expressed more formally as “keep on *β*-reducing until you can’t do it anymore”, where *β*-reduction refers to rewriting an expression of the form “function of a variable, applied to an expression” – something that looks like \( (\lambda x. b) M\), which in the theory of programming languages is called a * β-redex*, from

*– by substituting the expression for the variable, which can be denoted in symbols as*

**red**ucible**ex**pression\[ (\lambda x. M) E \overset{\beta}{\longrightarrow} M[x := E] \]

— in other words, the obvious thing that one might do with a function applied to something. For example,

This process is called *reduction*, and the resulting way to do normalisation is called *normalisation by reduction*. In more complex type theories, there are many more rules than just *β*-reduction and *η*-expansion, all of which must be applied to perform normalisation.

Normalisation by reduction works, and for a simple-minded implementation of a typechecker (such as all the ones I’ve written in the past), there’s nothing wrong with it. The bad thing about using normalisation by reduction in a “production” typechecker, such as one you’d put in a theorem prover that real people use, is that explicit syntactic substitution involves a lot of manipulation of abstract syntax trees. Compared to the alternative that this post is about, this means that normalisation by reduction is usually *slower*, and it also happens to be quite tricky to implement.

Instead, we can construct a way to perform normalisation by taking advantage of efficient methods for *evaluation*.

# Normalisation by evaluation for the (untyped) *λ*-calculus

We will illustrate normalisation by evaluation in an *untyped* setting, by using it to study the untyped *λ*-calculus in this article. NbE also has a typed variant, which is the one we’re actually interested in (we care about type systems with expressive, often dependent notions of typing: at least, I hope “we” do *winks*). We’ll address typed NbE in a future post, but the simplicity of the untyped *λ*-calculus makes it an ideal setting in which to come to grips with how NbE works. (At least, it did for me.)

Recall from the opening of this article that our goal is to start with a syntactic expression of some type \(\Exp\) describing our syntax and end up with a normal form of type \(\Nf\), which describes the normal forms in our syntax. At a high level, the untyped NbE algorithm works by *interpreting*, *reflecting*, or *evaluating* our syntax \(\Exp\) into a *semantic domain* \(\D\), then *reifying* or *quoting* the semantic value back into the normal-form subset \(\Nf\) of our syntax. David points out on Twitter that this recipe works because the semantic domain \(\D\) is constructed in such a way that semantic values satisfy the equalities we demand from our normal forms, so when we evaluate syntax into semantics and then quote back into syntax, the resulting syntax satisfies those equalities as well and hence can be constrained to the “normal form” subset \(\Nf \subseteq \Exp\).

## Background: De Bruijn indices and levels

De Bruijn indices are a tool used to describe variable binding that provide a lot of convenience both for theoretical purposes and for someone implementing a language using them, compared to describing bindings using, say, variable name-based (*named*) representations. Using (0-based) De Bruijn indices, the “S combinator” – \[\lambda x.\lambda y.\lambda z. x z (y z)\] – would be written `λ λ λ 2 0 (1 0)`

. The number \(n\) refers to how many *λ*s are in between the variable and its binding site: the most recently bound variable is `0`

, the one bound before that is `1`

, and so on.

De Bruijn levels are similar, and in some sense arise from the “opposite” or “dual” convention, in which the number \(n\) refers to the \(n\)th lambda from the beginning. In this notation, the S combinator would be written `λ λ λ 0 2 (1 2)`

.

It should be clear how to convert between DB-indices and -levels if one considers the problem for a little while. Abel depicts the relation between them in an expression with \(n\) lambdas as follows:

\begin{alignat*}{10} \text{index}\quad &0\quad &&1\quad &&\ldots\quad &&k\quad &&\ldots\quad &&n-i-1\quad &&\ldots\quad &&n-2\quad &&n-1\\ \text{level}\quad &n-1\quad &&n-2\quad &&\ldots\quad &&n-k-1\quad &&\ldots\quad &&i\quad &&\ldots\quad &&1\quad &&0 \end{alignat*}

## Implementing NbE for the untyped *λ*-calculus

We define the grammar that specifies the syntax of untyped *λ*-calculus (without lets or booleans!) itself below. In addition, we also define two additional grammars that specify the syntax of two subclasses of terms, called *neutral* terms and *normal* terms.

\begin{alignat*}{4} \text{expressions} \quad &\Exp &&\ni r, s, t \quad &&::=\quad \vi \pipe \lambda t \pipe r\,s \\ \text{neutral terms} \quad &\Ne &&\ni u \quad &&::=\quad \vi \pipe u\,v \\ \text{normal terms} \quad &\Nf &&\ni v \quad &&::=\quad u \pipe \lambda v \end{alignat*}

Neutral terms are normal terms that are “blocked” or “stuck”, either because some variable has an unknown value (the term is *open*), or we are trying to apply something to a value but cannot realise our “something” as a function (in other words, we cannot *β*-reduce or perform “substitution”). This property of neutral terms is immediate from the above definitions. Indeed, in defining neutral and normal terms, we’ve cleverly divided up the three elements of the grammar of expressions – variables, abstractions, and application — in such a way that normal and neutral terms are forced to be *β*-normal. (A term is *β*-normal if it contains no *β*-redexes, that is, if no further *β*-reduction is possible.) We see that our application form \(u\,v\) can only apply a neutral term to something, and neutrals cannot contain or be *λ*-abstractions, as is obvious from the grammar. Similarly, normal terms are where all the lambdas end up residing, but they offer us no way to *apply* anything to a *λ*-abstraction.

We represent the *syntactic domain* (which is what we’ll call the (original) “object language” above that captures the *syntax* of the untyped *λ*-calculus) in Haskell as follows:

```
> data Exp
> = ExpVar Int
> | ExpLam Exp
> | ExpApp Exp Exp
>
> data Ne = NeVar Int | NeApp Ne Nf
>
> data Nf = NfNeut Ne | NfLam Nf
```

The *semantic domain*, which is the collection of grammars capturing “meaning” (or the “modified object language” that we interpret the object language *into*, if you prefer), looks like this:

\begin{alignat*}{3} \text{environments} \quad&\Env &&\ni \rho \quad &&::=\quad\text{functions}\,\, \mathbb{N}\to \D\\ \text{} \quad &\D &&\ni d \quad &&::=\quad (\underline{\lambda} t) \rho \pipe e \\ \text{\(\D\) extended with neutrals} \quad &\Dne &&\ni e \quad &&::= \quad \xx_k \pipe e\, d \end{alignat*}

Semantic terms \(\xx_k\) represent variables, but unlike the \(\var_k\) variables from the syntactic setup, they do so using De Bruijn *levels* instead of indices. This is because semantic terms refer to “absolute” variables that are akin to constants, rather than index-like references whose meaning is context-dependent. Due to this property, they lend themselves well to being represented by levels instead of indices, since the latter have to be adjusted by incrementing when placed under new binders. This design decision frees us from considering complications like having to build some notion of context shifting into the theory of the semantic domain or evaluation-quotation itself, and means we never have to perform the bookkeeping or increments/decrements common in most presentations of De Bruijn index-based variable handling. Note how the definitions of \(\D\) and \(\Dne\) mimic those of \(\Ne\) and \(\Nf\) in what forms of term they allow us to represent.

We translate the semantic domain into code in terms of the following types. First, an environment (also often called a *context*) is a mapping of De Bruijn indices (hence variables) to semantic terms (values):

Semantic terms are either closures containing a syntactic term and an environment in which to make sense of it, or stuck semantic terms – semantic *neutrals* represented using the type `Dne`

– that we can’t reduce any further.

Note how `D`

values built using the `Clos`

constructor contain environments represented as native Haskell functions.

Next we have a “context shifting” operation called `update`

, which can be thought of as pushing us one *λ*-binder deeper: it binds \(\var_0\) to `d`

and updates the rest of the context to match.

We have two mutually recursive functions that help us go from syntax to semantics: given an environment assigning meanings to variables, `eval`

interprets an syntactic expression into the semantic domain, giving us a semantic term, aided by `apply`

, which performs function application (substitution!) in the semantic domain.

```
> eval :: Exp -> Env -> D
> eval e rho = case e of
>
> -- look it up in the environment
> ExpVar i -> rho i
>
> -- store the body of the lambda in a closure, together with the environment
> -- we saw it in
> ExpLam t -> Clos t rho
>
> -- interpret the function and the argument, and then substitute (efficiently!)
> ExpApp r s -> apply (eval r rho) (eval s rho)
>
> apply :: D -> D -> D
> apply (Clos t rho) d = eval t (update rho d)
> apply (Up e) d = Up (App e d)
```

`quoteNf`

and `quoteNe`

perform *quotation*, or *read-back* in Abel’s terminology, of semantic terms back into the syntactic domain. The only interesting case is the `Clos`

clause in `quoteNf`

:

```
> quoteNf :: Int -> D -> Nf
> quoteNf n (Clos t rho) = NfLam (quoteNf (n + 1) b)
> where
> -- after "entering" a new binder using the update function,
> -- using a "fresh" variable,
> rho' = update rho (Up (Var n))
> -- evaluate the body of the lambda on the new, innermost variable
> -- (remember that the semantic domain uses DB-levels)
> b = eval t rho'
> quoteNf n (Up e) = NfNeut (quoteNe n e)
>
> quoteNe :: Int -> Dne -> Ne
> quoteNe n (Var k) = NeVar (n - k - 1) -- DB-level to -index
> quoteNe n (App e d) = NeApp (quoteNe n e) (quoteNf n d)
```

Finally, normalisation is performed by first evaluating syntactic terms into the semantic domain, and then quoting the result back to give us a normal form.

```
> -- | Compute the normal form of an open term with at most `n` free variables.
> nfOpen :: Int -> Exp -> Nf
> nfOpen n t = quoteNf n (eval t rho)
> where rho i = Up (Var (n - i - 1)) -- DB-index to -level
>
> -- | Compute the normal form of a closed term.
> nf :: Exp -> Nf
> nf = nfOpen 0
```

As a sanity check, we define the SKI combinators…

```
> expI = ExpLam (ExpVar 0)
>
> expSKK = ExpApp (ExpApp expS expK) expK
> where
> -- λ 0
> expI = ExpLam (ExpVar 0)
> -- λ λ 1
> expK = ExpLam (ExpLam (ExpVar 1))
> -- λ λ λ 2 0 (1 0)
> expS
> = ExpLam (ExpLam (ExpLam
> (ExpApp (ExpApp (ExpVar 2) (ExpVar 0))
> (ExpApp (ExpVar 1) (ExpVar 0)))))
```

… and confirm that `SKK = I`

when defined in terms of the lambda-calculus:

## A “pragmatic” variation on the textbook implementation

It is possible to entirely mechanically combine `D`

and `Dne`

(into a `Val`

type), by “inlining” the constructors for neutrals into the datatypes for normal forms. This makes some illegal states representable — for example, you can represent *β*-redexes in the semantic domain, which is kind of iffy — but has the advantage of reducing a bit of indirection and improving performance, and also just reducing the amount of code that we have to work with.

Let’s use a superscript C to denote the pieces of this new “collapsed” implementation. First, we can get rid of `Up`

and `NfNeut`

, which were just wrappers to bring neutrals into normal terms and values:

```
> data Expᶜ = ExpVarᶜ Int | ExpLamᶜ Expᶜ | ExpAppᶜ Expᶜ Expᶜ
> data Valᶜ = ValVarᶜ Int | ValClosᶜ Expᶜ Envᶜ | ValAppᶜ Valᶜ Valᶜ
> type Envᶜ = Int -> Valᶜ
```

and the code for the old `apply`

can be “inlined” into that of `eval`

:

```
> updateᶜ :: Envᶜ -> Valᶜ -> Envᶜ
> updateᶜ rho d 0 = d
> updateᶜ rho d i = rho (i - 1)
>
> evalᶜ :: Expᶜ -> Envᶜ -> Valᶜ
> evalᶜ e rho = case e of
> ExpVarᶜ i -> rho i
> ExpLamᶜ t -> ValClosᶜ t rho
> ExpAppᶜ r s -> case (evalᶜ r rho, evalᶜ s rho) of
> (ValClosᶜ t rho, d) -> evalᶜ t (updateᶜ rho d)
> (e, d) -> ValAppᶜ e d
```

If we proceed as above for quotation too, we can combine `Nf`

and `Ne`

, and there would only be one `quoteᶜ`

function (performing quotation to normal forms: our old `quoteNf`

), with `quoteNe`

inlined into `quoteᶜ`

where the `Up`

case was. It should be clear that collapsing neutrals into the type of normal forms gives us a type that is isomorphic to `Expᶜ`

, so we can go even further and just identify the two, quoting back into the original syntactic domain:

```
> quoteᶜ :: Int -> Valᶜ -> Expᶜ
> quoteᶜ n (ValVarᶜ k) = ExpVarᶜ (n - k - 1)
> quoteᶜ n (ValAppᶜ e d) = ExpAppᶜ (quoteᶜ n e) (quoteᶜ n d)
> quoteᶜ n (ValClosᶜ t rho) = ExpLamᶜ (quoteᶜ (n + 1) b)
> where
> rho' = updateᶜ rho (ValVarᶜ n)
> b = evalᶜ t rho'
>
> nfOpenᶜ :: Int -> Expᶜ -> Expᶜ
> nfOpenᶜ n t = quoteᶜ n (evalᶜ t rho)
> where rho i = ValVarᶜ (n - i - 1)
>
> nfᶜ :: Expᶜ -> Expᶜ
> nfᶜ = nfOpenᶜ 0
```

We can repeat our SKI combinator check from before. (I’ve put the definitions at the end of the article to reduce clutter.)

If you squint, except for the fact that we use a nameless (De Bruijn) representation instead of his explicitly named representation, this is almost the same as András’s implementation of untyped NbE for the *λ*-calculus! And he *really* knows what he’s doing, so I’ll call that a job well done.

# Acknowledgements

I’m grateful to David for suggesting some improvements to this article and clearing up some doubts I had. Thanks to Brendan for getting me interested in NbE in the first place, and to András for his help and for writing the literature review and the Elaboration Zoo.

## Odds and ends

I prefer putting derived instances at the end to help streamline the main presentation a little. (Every bit helps; I barely understand this material as is!)

```
> deriving instance Show Exp
> deriving instance Show Nf
> deriving instance Show Ne
>
> deriving instance Eq Exp
> deriving instance Eq Nf
> deriving instance Eq Ne
>
> deriving instance Show Expᶜ
> deriving instance Eq Expᶜ
```

I should really have used a tagless-final representation to avoid rewriting these… on second thought, maybe not.

```
> expIᶜ = ExpLamᶜ (ExpVarᶜ 0)
>
> expSKKᶜ = ExpAppᶜ (ExpAppᶜ expS expK) expK
> where
> -- λ 0
> expI = ExpLamᶜ (ExpVarᶜ 0)
> -- λ λ 1
> expK = ExpLamᶜ (ExpLamᶜ (ExpVarᶜ 1))
> -- λ λ λ 2 0 (1 0)
> expS
> = ExpLamᶜ (ExpLamᶜ (ExpLamᶜ
> (ExpAppᶜ (ExpAppᶜ (ExpVarᶜ 2) (ExpVarᶜ 0))
> (ExpAppᶜ (ExpVarᶜ 1) (ExpVarᶜ 0)))))
```