Starting with an initial distribution, evoution took place with the basic AndrewsCurtis moves. I ran two loops with two steps each, and then resumed and ran 3 more loops. The raw results are at first run results
In terms of the highest weights, the method worked well even with such a short run. For example, the presentation $\langle a, b; \bar{a}, b\rangle$ was one of those with high weight.
In general, the order was according to expected behaviour.
However, virtually all theorems were remembered.
1


1 2 3 

1


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 

1 2 3 4 5 

We build differentiable functions for our basic learning system using some basic ones and combinators:
for a vector space $V$.
By Liebniz rule, the total derivative is
Note that depends on the vector space structure on $(\mathbb{R}, V)$. For a vector $w$ in $V$, note that we can write
By definition of gradient (as adjoint derivative), we have
We need an additional implicit structure with combinators:
1 2 3 

1 2 3 

Linear Spaces can be built with
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

In addition, we need inner products for computing certain gradients as well as totals for normalization. These are built for
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 

Differentiable functions are built from
From the above, we should be able to build.
Given a function $g$, we get a function at depth $d$ by iterating $2^d$ times by repeated squaring.
Determined by:
Recurrence relation:
We begin by representing the two simplest propositions: true  always true, and false.
1 2 3 4 5 

The $True$ type has just one constructor, giving an object with type $True$. On the other hand, the $False$ type has no constructors, so there are no objects of type $False$.
There are various ways of building propositions from other propositions. We see how these translate to constructions of types.
If $P$ and $Q$ are propositions, which we identify with their corresponding types. We interpret the proposition $P \implies Q$ as the function type $P \to Q$.
Modus Poens is the rule of deduction (due to Aristotle) that says that if the proposition $P$ is true, and $P$ implies $Q$, then $Q$ is true. We can prove this in the types interpretation. Namely, Modus Poens transaltes to the statement that if we have an objects of type $P$ and $P \to Q$, then we have an object of type $Q$. We get an object of type $Q$ by function application.
1 2 

Next, we see my favourite method of proof  vacuous implication. This says that a false statement implies everything, i.e., for any proposition $P$, we have $False \implies P$, which in type theory says $False\to P$ has objects.
As the $False$ type has no cases at all, a function is defined on $False$ by using an absurd pattern, which just says that there are no cases, so no definition is needed.
1 2 

Next, we define a type family Even n which is nonempty if and only if n is even. To do this, we see that a number is even if and only if it is even as a consequence of the rules
1 2 3 

We can prove that $2$ is even by applying the second rule, with $n=0$, to the first rule (telling us that $0$ is even).
1 2 

We next show that $1$ is odd. This means that we have to show that the type $Even 1$ is empty. While rules let us construct objects, and verify their types, there is no rule that tells us directly that a type is empty.
However, there is a nice way of capturing that a type $A$ is empty  if there is a function from $A$ to the empty type $False$, then $A$ must be empty  there is nowhere for an object of type $A$ to be mapped.
Indeed, what we prove is that there is a function from $Even\ 1$ to $False$ ; we define this using an absurd pattern.
1 2 

One of the most fundamental concepts in homotopy type theory is the identity type family, representing equality between objects with a given type. This is an inductive type family, generated by the reflexivity constructor giving an equality between an object and itself.
1 2 

Note that while this is an inductive type family, for a fixed object a the type $a==a$ is not an inductive type defined by $refl(a)$, i.e., we cannot define (dependent) functions on this type but just defining them on the reflexivity constructor. This is a subtle point, which will become clearer as we look at the topological interpretation. We shall study the identity type extensively.
For now, let us show some basic properties of the identity type. All these are proved by constructing objects by pattern matching (recall that these are dependent functions, so we are formally constructing them by induction, not recursion).
Firstly, if $f$ is a function and $a==b$ (i.e., there is an object of type $a==b$), then $f(a)==f(b)$ (again, we mean there is an object of this type).
1 2 

Further, we see that equality (given by the identity type) is symmetric and transitive.
1 2 

1 2 

There is a new Agda feature we have used in the proof of transitivity: the dot notation. Notice that we have a term .a  this says that we can deduce, from the types, that $a$ is the only possibility at its position in the pattern.
Suppose we have a type $A$ and a family of types $P(a)$ (which we regard as propositions), with a type associated to each object $a$ of type $A$. Then all the types $P(a)$ have objects (i.e., all corresponding propositions are true) if and only if there is a dependent function mapping each object $a$ of type $A$ to an object of type $P(a)$. Thus, the logical statement
translates to the product type
In Agda, we represent the product type as $(a : A) \to P(a)$
We can now prove a more substantial result. Namely, suppose we have a function $f : \Bbb{N}\to \Bbb{N}$ , and suppose for all $n\in \Bbb{N}$ we have $f(n) = f(n+1)$, then we show that, for all $n$, $f(n)=f(0)$.
Such a statement is normally proved by induction (indeed any proof must reduce to induction). We see that our pattern matching rules for constructing dependent functions suffice to prove this  this is why the dependent function that gives them is called the induction function.
1 2 3 

Let us look at the statement and proof more carefully. Firstly, note that the statement is of the form
As we have seen, the first term is a translation of $\forall f : \Bbb{N} \to \Bbb{N}$, so the statement says that for all functions $f : \Bbb{N} \to \Bbb{N}$ (or any other codomain with obvious modifications), we have $P(f) \to Q(f)$.
This in turn is a translation of $P(f) \implies Q(f)$. So we can regard $P(f)$ as the hypothesis, for a fixed function $f$, and $Q(f)$ as the desired conclusion.
The hypothesis P(f) is the statement
which is just the statement that for all $m$, $f(m+1)=f(m)$. Finally, the conclusion $Q(f)$ just says that $f(n)=f(o)$ for all $n$.
We now look at the proof. The two constructors correspond to the base case and the induction step. Here the base case is just $f(0)=f(0)$, which follows from reflexivity of equality.
The induction step is more complicated. We prove the result for $n+1$ assuming the result for $n$. The main hypothesis $\forall m, f(m+1) = f(m)$, is, by pattern matching, $adjEq$. The right hand side, which is the proof in the case of $n+1$, is obtained from:
This proof is remarkable in many ways. First of all, note that this is no longer or more complicated than an informal proof. Further, note that we did not have to invoke the usual induction axiom schema, but instead just used the rules for constructing objects. Most significantly, as most of our types are inductive type (or type families), we get recursive definitions and inductive proofs in all these cases.
Indeed, using recursive definitions for inductive types we get all so called universal properties. Furthermore, traditionally universal properties require a separate uniqueness statement. But recursioninduction is powerful enough to even give the uniqueness statements for universal properties. This means a great deal of mathematical sophistication (universal algebra, various aspects of category theory) are encapsulated in these simple functions.
We now see the types corresponding to the other ways of combining propositions : and, or and there exists.
Firstly, if $A$ and $B$ are types corresponding to propositions, then there are objects with each of these types if and only if there is a pair $(a, b)$ of the pair type $A \times B$, which we define as follows.
1 2 

We have to use square brackets as parenthesis have a special meaning in Agda. Observe that there is a single constructor that takes as input an object in $A$ and an object in $B$. We regard the result as the pair.
Next, suppose $A$ and $B$ are types corresponding to propositions and we wish to construct a type corresponding to $A$ or $B$, then we require a type whose elements are elements of $A$ and elements of $B$, or more accurately the images of such elements under constructors. This is the direct sum type.
1 2 3 

Finally, if we are given a collection of types $B(a)$ for objects $a$ of type $A$, we construct the type corresponding to at least one of these types having an element, i.e., a there exists type.
1 2 

Notice that the constructor for this type has as input an element $a$ and an element of type $B(a)$. Such elements exist if and only if some $B(a)$ is nonempty, i.e., the corresponding proposition is true.
As we see, we can express all the usual mathematical statements using types built up using our basic constructions: inductive types, functions and dependent functions. We have also seen that the basic rules for constructing objects are powerful rules of deduction. However, there are some things they cannot deduce, for instance the statement (called the axiom of extensionality) that if $f, g: A\to B$ are function with $f(a)=g(a)$ for all $a \in A$, then $f=g$. Hence, we have to introduce this as a postulate  we just postulate that there is an object (which we give a name) of a given type.
1 2 

We can similarly introduce axioms specific to a domain, say Euclidean geometry, by postulating them in a module.
]]>Let us start with a function checking if a list contains an element with a certain property.
1 2 3 

Before turning to filtering, we define an if expression, which gives one of two values depending on whether a Boolean term is $true$ or $false$. Note that this is not an if statement, doing something according to a condition, but a function that returns a value depending on the condition.
1 2 3 

With this in hand, we define a function filtering elements of a list by a given property. This is defined inductively, using the if expression to decide whether to prepend the first element.
1 2 3 

So far we have defined many objects and types, but not actually computed anything concrete. To do this, we shall use a very useful function on lists, folding. This is a function that takes as input:
The fold is obtained by starting with the given element in B, and successively multiplying on the left by elements of the list, starting with the rightmost. This stops when the list is empty, given an element in B. Thus, if we omit parenthesis (assuming associativity of the operation), and if * denotes the operation, then folding is the function
As usual, we define the fold function recursively.
1 2 3 

Equipped with this, we now give a function computing the sums of squares of numbers from 1 to n. After importing natural numbers, we define (recursively) a function that gives the list of numbers from 1 to n. We then map this to get a list of squares. Finally, we fold this by the addition function to get the sum of squares.
1 2 3 4 5 6 7 8 9 10 11 

In the Agda mode of emacs, we can evaluate , for example, the sum of squares up to $20$ to get $2870$. This illustrates that what we are doing does include general purpose programming  indeed it implements an older model of computing, due to Church, which is equivalent to what may be more familiar models.
We next define a function that finds an element in a list (of type $A$) having a given property. However the list may have no such element, so we cannot always return an object in $A$. Simply giving an error is meaningless mathematically and a bad idea for a program.
To deal with such a situation, we use the type $Option A$, whose objects are $Some a$ for objects $a$ of type $A$, and an object $None$ representing no result.
1 2 3 

We can now define a find function, returning an option type, containing an object in the list with the given property if there is such an object.
1 2 3 

Thus, using option types, we can represent partially defined functions  those defined on some objects of a type $A$. One often encounters partial functions  finding in a list, taking squareroots, division (avoiding division by $0$), etc. We can lift such a partially defined function to one taking values in an option type. We shall identify partial functions with their lifts.
Suppose we have a partially defined function $f: A\to B$ (not defined on some values of $A$ in general) and a function $g : B \to C$ (defined everywhere). Then it is natural to define the composition of $f$ and $g$ as a partial function defined wherever $f$ is defined. Passing to lifts, this is accomplished by the map function on option types.
1 2 3 

Even better, if both $f: A\to B$ and $g : B\to C$ are both partially defined, we can compose them to get a value in $C$ exactly for elements $a : A$ for which $f(a)$ is defined and lies in the domain of g. We do this by flatmapping (after passing to lifts once more).
1 2 3 

There are some obvious common features between lists and option types.
Types that have these properties (and some consistency relations between the embeddings, map and flatmap) are called Monadic. There are many other useful such types, for example $Future A$ is used to represent an object of type $A$ that will be available in the future, for instance after a long computation. We do not have to wait for the computation to be over before saying what to do with the result when available, even if this involves another long computation which will return a result as a Future.
]]>then
Dependent functions generalize functions, with the above properties continuing to hold. What we no longer require is that $f(a)$ has a fixed type independent of $a$, namely the codomain B. Instead we have a family of codomains $B(a)$, so that $f(a)$ has type $B(a)$.
Such objects are common in mathematics (and physics). For example, the velocity of water flowing on a sphere gives a vector field on a sphere. At a point $x$ on the sphere, the value of the vetor field $V$ lies in the tangent space at the point, i.e.,
Hence it is best to view vector fields as dependent functions. In mathematics, the codomains are generally called fibers, which together form a fiber bundle, and dependent functions are called sections of this bundle.
We can (and often do) regard a dependent function as an ordinary function with codomain the union of the various codomains. But, besides losing information, the function we get is not natural, in the sense that it does not respect the underlying symmetries.
We now turn to some simple examples and code. First we consider type families, which give the collection of codomains for dependent functions. The typical example is vectors of length $n$ of elements of a type $A$. Formally, a type family is just a function with codomain a universe, so the values the function takes are types.
1 2 3 

This gives a family of types parametrized by A and indexed by natural numbers. The difference between parameters and indices is a bit subtle but crucial. Observe that the Agda syntax treats them quite differently.
We defined Booleans and natural numbers using the data statement, and defined functions on them by pattern matching. More conceptually, these are inductive types, and functions defined on them are defined by applying the recursion function. For instance, in the case of Booleans, the recursion function takes as input a type $A$ and two objects with that type (the values of $true$ and $false$) and gives a function from Booleans to $A$.
In the case of lists, for each type $A$, we obtain a corresponding inductive type. Thus we have a family of inductive types, parametrized by the type $A$.
In the case of vectors too, the type $A$ acts as a parameter. Assume that the type $A$ is fixed, so vectors are now a family of types indexed by natural numbers.
However, the vectors of a fixed size (say $7$) do not form an inductive type  we cannot define a function recursively on vectors of length $7$ alone. In this case, this is evident from the definition, as the constructor giving vectors of size $7$ uses vectors of size $6$. So a recursive definition must also involve vectors of size $6$, hence of $5$ etc.
We can, however, recursively define functions on vectors of all sizes, i.e., of all values of the index. For examples, here is the function that appends (adds to the end) an element to a vector.
1 2 3 

Thus, vectors form an inductive type family indexed by natural numbers (and parametrized by A). As we remarked, the type for a given index is not an inductive type. Note that even in cases where we can meaningfully write down a recursion rule for the type at a fixed index, such a recursion rule does not in general give a function on that type.
Remark: From the point of view of programming languages, there is another sense in which indexing by natural numbers is different from parametrizing by types  the types we construct depend on objects, not just other types. Modern languages usually allow types to depend on other types (sometimes called generics), but most do not allow dependence on objects.
We shall now construct a dependent function countdown that maps a natural number $n$ to the list consisting of natural numbers from $n$ down to $0$. Thus the type of $countdown(n)$ is vectors in natural numbers of length $n+1$.
1 2 3 

The definition in terms of pattern matching is similar to recursive definitions of functions. In terms of homotopy type theory, dependent functions on inductive types are constructed by applying a dependent function called the induction function to the data.
The type of a dependent function is called the product type corresponding to the family of types with base the domain. For instance, the type of countdown is
Except for universes (which we will keep in the background as far as possible), we have now seen all the type constructions  inductive types, functions and dependent functions.
A principal use of types in programming is to avoid writing meaningless expressions, by ensuring that such expressions violate the rules for constructing objects and types. Dependent types are even better than this. For instance, consider componentwise addition of vectors, or more generally componentwise application of a binary operation. This makes sense only when both vectors have the same length. Using dependent functions and types, we can define the function in such a way that it is defined only on pairs of vectors with the same length.
1 2 3 

Note that we could have used lists in place of vectors, but we would then have to give a definition that can lead to errors at runtime.
]]>1 2 3 

Here $zero$ is $0$ and $succ$ is the function taking any natural number to its successor. Any natural number can be built from these, and different expressions give different numbers. However, as with the case of Booleans, we do not have any explicit rules giving such statements  indeed, it would be quite painful to formulate such rules, as we have to define expressions, equality of expressions etc. Instead, the rules for defining functions from natural numbers implicitly contain such statements. We define functions on natural numbers by pattern matching, but this time the patterns can involve the constructors.
1 2 3 

The left hand sides pattern match on the constructors for natural numbers. Notice that this is a recursive definition, the right hand side in the second pattern for defining addition also involves addition. This is fine as the definition lets us rewrite a sum in terms of sums of simpler terms, and such a process eventually terminates. However, we can write nonsensical definitions in general.
1 2 3 

When we try to compute $forever(1)$ (where $1 = succ (zero)$), we get that $forever(1) = forever(2)$, and this continues forever. Fortunately, Agda will not let us make such definitions. This does come at a price though  sometimes it takes a lot of work to convince Agda that functions terminate. There are deeper, conceptual limitations to any system ensuring termination, as discovered by Turing, which we return to in the future.
In homotopy type theory, the rules for such recursive definitions are nicely formulated so we cannot making nonsensical definitions. We shall see how this is done much later. We now define a couple of other recursive functions.
1 2 3 

1 2 3 

Our constructors are unfortunately only as efficient as tally marks in dealing with actual numbers. Agda has a built in type that lets us deal with natural numbers the usual way. We can invoke it as below.
1 2 3 

We shall now define the type of Lists of objects, each of which is of a given type A. Thus, we are defining not just one type, but a family of types parametrized by the type A.
1 2 3 

The constructor $[]$ is just a constant, giving us the empty list. The constructor $ _ :: _$ gives a list from an element $a$ in $A$ and a list $l$ of elements in $A$ by adding $a$ to the beginning of the list $l$. For instance (after importing the natural numbers constructed earlier) we can describe the list [1, 2, 3] as
1 2 3 4 

We can make things a little cleaner by specifying that $_::_$ is right associative. We shall discuss this later. Let us now define the length of a list. Here is our first definition.
1 2 3 

We have given a recursive definition. We defined a function of the type $A$ as well as the list, as we needed $A$ in the right hand side of the second pattern. But $A$ can be inferred in this pattern  it is the type of the element $a$. So we can declare $A$ to be an optional argument (by putting it in braces), and let Agda infer its value. This gives us a cleaner definition.
1 2 3 

Next, we define some functions recursively. The first is concatentation, which combines two lists by giving the entries of the first followed by those of the second.
1 2 3 

We next define the function that reverses a list.
1 2 3 

We now turn to some more interesting functions. Given a list of objects of type $A$ and a function $f:A \to B$, we can apply $f$ to each entry of the list to get a list of elements of $B$. This is usually called the $map$ function.
1 2 3 

We can do more  if we have a function $f: A \to List B$, then we can map a list $l$ of elements of $A$ to a list of elements of $B$  each element of $l$ maps to a list of elements of $B$, and we get a list by concatenating these lists together.
1 2 3 

In case you did not notice, we have been programming in a functional language. To those used to imperative languages (say C), this may not look like programming, but programming in functional languages is essentially building functions, collections and structures. Indeed anyone who has programmed in (for example) scala for a while hungers for a flatmap function. We will eventually see some of the other main ingredients of collections in functional languages  finding in, filtering and folding lists. But before that we shall take a first look at dependent types.
]]>Mathematicians like to imagine that their papers could in principle be formalized in set theory. This gives them a feeling of security, not unlike the one experienced by a devout man entering a venerable cathedral. It is a form of faith professed by logicians. Homotopy Type Theory is an alternative foundation to set theory.
In this series of posts, we explore the background behind, and the basics of, homotopy type theory. We begin by laying out the basic rules governing such foundations.
We regard almost everything in mathematics as an object  not just numbers or groups, but even theorems, proofs, axioms, constructions etc. This lets us consider functions acting on, and collections of, just about everything. Strictly speaking objects should be called terms, based on standard terminology of logic, but I shall use the word object as it better represents how I feel we should think about them.
Every object has a type, which is generally (but not always) unique. Types themselves are objects, so in particular have types. A type whose objects are themselves types is called a universe. Every set is a type, but not all types are sets.
Our foundations are governed by rules telling us the ways in which we can construct objects, which we will see as we go on. Further, we have rules letting us make two types of judgements:
Objects may be equal even if they are not so by definition. For instance, for a real number x, we have the equality
which it would be silly to say is true by definition. Rather, it is a proposition that it is true, and we say the two sides of these are propositionally equal (but not definitionally or judgementally equal). Similar considerations also apply to objects having specified types.
In addition to constructing objects by our rules, we can introduce objects and make some judgements related to them as axioms  this may of course introduce inconsistencies.
Let us begin constructing objects. We shall do this in the language/formal proof system Agda, but I will try to be selfcontained.
We start with a built in universe called Set  objects having type Set are sets. We shall use the data statement to construct the type of Booleans  which is a basic logical type used to represent whether a statement is true or false.
1 2 3 

The above code gives the construction of three objects: a type called Bool and two objects true and false that have type Bool. This is a simple rule for constructing a type  give it a name and list objects in it.
Note that we do not have an explicit rule saying that the type Bool has no other objects, or even that true and false are different objects. Indeed we will not introduce any such rule, but rules for constructing objects will implicitly tell us that Bool has exactly two distinct objects, true and false.
A central role in type theory is played by function types  one may even say that the principal virtue of type theory is treating functions with the respect they deserve (as any good programming language does), instead of trying to wrap them up as subsets of cartesian products that happen to satisfy some properties. We now define our first function, and our first function type.
Given types A and B, functions from A to B give a type
We construct the function $not$ from Booleans to Booleans, which is the logical negation.
1 2 3 

We have defined the function not by giving the values on each Boolean constant, and these values are constants. This is the simplest form of pattern matching.
Note that being able to construct such a function means $true$ and $false$ are distinct objects as promised.
Let us now turn to functions of two variables. By a trick due to the logician Haskell Curry, we do not have to introduce a new type. Instead, observe that if we have a function $f(x,y)$ of two variables (of types $A$ and $B$, taking values of type $C$), then for fixed $x$ we get a function of $y$ only
$f(x , _ ) = y \mapsto f(x,y)$
so $f(x, _ )$ has the type of functions from $B$ to $C$. Now viewing $f(x, _)$ as a function of $x$, we get the curried form of $f$,
which has type
We define the logical and (in its curried form). A convenient Agda feature is that we can define functions so that their arguments can appear in the beginning, end, middle or any other combination, just by putting underscores as placeholders.
1 2 3 4 

The definition is a small extension of pattern matching we used in defining not. If some argument does not affect the result, we can use an underscore to represent all cases.
So far we have defined functions by defining their values on each constant as a constant. In general we use variables to represent arguments, and the values are built from existing objects. Here is an example.
1 2 

We used a variable for the argument of the function, another generalization of pattern matching. The right hand side was built from previously constructed objects using function application, a function $f$ from $A$ to $B$ can be applied to an object with type $A$ to get an object $f(a)$ with type $B$.
Functions can be specified by describing the image of each argument. These are given by λexpressions, following the notation of Church’s λcalculus. We define such a function, $verytrue$, below, by equating it with an anonymous function that forms the right hand side.
1 2 

We can define (curried) functions of more than one variable using λexpressions without having to explicitly nest λs.
1 2 

We have now seen the most basic constructions of types and objects. We shall next turn to the more powerful constructions  inductive types and dependent types.
]]>