Automating Mathematics

Andrews-Curtis: First Run

2015-09-02T09:11:59+05:30

Finally having run something in ProvingGround involving the main learning method, here are some observations.

Simple Andrews-Curtis run

Starting with an initial distribution, evoution took place with the basic Andrews-Curtis moves. I ran two loops with two steps each, and then resumed and ran 3 more loops. The raw results are at first run results

Conclusions

In terms of the highest weights, the method worked well even with such a short run. For example, the presentation $\langle a, b; \bar{a}, b\rangle$ was one of those with high weight.
In general, the order was according to expected behaviour.
However, virtually all theorems were remembered.
This stems from two weaknesses:
- two high a weight for theorems relative to proofs, leading to memos favoured over deduction.
- not enough methods of deduction, specifically exlusion of multiplication-inversion moves.
These can be corrected by:
- adding more moves: multiplication-inversions, transpositions; or allowing composite moves.
- lowering the probability of word-continuation in generating presentations (just a runtime parameter).

Lambda Islands

2015-01-16T15:51:20+05:30

A single lambda island

We are given a type $T$.
Generate a new variable $x : T$. The result and gradient do not depend on $x$ (upto equality).
We have an inclusion map $i_x: FD(V) \to FD(V)$. This is almost the identity, except the gradient purges the weight of $x$.
From this we get the initializer on distributions, introducing $x$ with weight $p(m)$ with $m$ the $\lambda$-parameter and scaling the rest by $1 - p(m)$.
It is probably easiest to directly define the initializer, defining its application and gradient.
We compose the initializer with a given dynamical system.
Finally, export by mapping $y$ to $x \mapsto y$. This is composed with the initializer and the given dynamical system.

Combining lambda islands.

Given an initial distribution, we have weights $p(T)$ associated to each type $T$.
Given a dynamical system $f$ and type $T$, we have a new system $\Lambda_T(f)$.
We take the weighted sum of these, i.e., $\Lambda(f) = \sum_T w(T) \Lambda_t(f)$.

Using recurrence

Given a dynamical system, generate a family $g_k$ indexed by natural numbers $k$ corresponding to iterating $2^k$ times.
Mix in lambda to get a new family $f_k$ with $f_k = g_k + \Lambda(f_{k-1})$, with the last term a priori vanishing for $k \leq 0$.

Dynamical Actors

2015-01-12T09:44:57+05:30

Context

Dynamical system $f: X\to X$ depending on parameters.
Goals to check.

Worker

Called with state, parameter,iterations, goals.

  case class LoopWork(x: X, p: P, loops: Int, goals: X => Boolean = List())

Iterates until success or loops completed.

  case class LoopsDone(id: String, final : X)

  case class Success(id: String, final : X)

Database

commits
solutions: commit, goal.
goals

Commits

State
Hash-tag : string representing hex code of hash-tag
Ancestor
Meta-data: who committed, why.

  def commit(x: X, parent: String) : String // returns hash-tag

Hub

Maintains all workers and communication with outside. Messages (and a few similar ones).

  case object ActorList // returns actor ids and whether they are paused.

  case class QueryState(id: String, pause: Boolean = true)

  case class QueryParams(id: String) // look up parameters at hub.

  case class UpdateParams(id: String, p: P)

  case class UpdateState(id: String, x: X)

  case class PauseWorker(id: String)

  case class ResumeWorker(id: String)

  case class StopWorker(id: String)

  case class UpdateGoals(id: String, goals: List[X => Boolean])

  case class ActorLoops(id: String, loops: Int)

For simplicity, assume that pausing on success is independent of actor, and pausing on query is part of the query string.

  case class ActorState(x: X, p: P,
    loops: Int,
    paused : Boolean,
    lastCommit: Long
    )

Callback on success

Commit the state.
Record that the commit contains a success.
Possibly update goals removing those attained.
SSE log the goals.

Channels

from interface: post JSON to server.
to interface: send SSE with type used for a switch to decide action.

Feedbacks

2015-01-09T12:03:51+05:30

Feedbacks

A feedback is a function $f: X \to X$, typically depending on parameters, but which is not differentiable (i.e., no gradient).

Combinators

Linear Structure: We can define a linear structure on A => B given one on B. This does not collide with that on differentiable functions as linear structures are not covariant.
Conjugation: Given a differentiable function $f: X \to Y$ and a (feedback) function $g: Y\to Y$, we can define a conjugate function $g^f: X\to X$ by

$g^f(x) = \Delta f(x)(g(f(x))).$

This behaves well with respect to composition of differentiable functions and linear combinations of feedbacks.
Partial conjugation: Sometimes the feedback also depends on $x \in X$, so we have $g: X\to Y$. We are then back-propagating $g$ using $f$.

Matching distributions:

Given a target distribution on $Y$, we get a feedback towards the target.
Given a distribution on $X$ and $f: X\to Y$, we have a pushforward distribution. We get an image feedback.
To get a feedback on $X$, we need to distribute among all terms with a given image proportional to the term, or depending only on the coefficient of the image image.
Question Can we define this shift on $X$ in one step correctly?
Gradient view: Just view $f$ as a differentiable function and back-propagate by the gradient. This shifts weights independently.

Terms matching types

We compare the distribution of terms that are types with the map from terms to types.
The difference between these gives a flow on types.
We back-propagate to get a flow on terms.

Approximate matches

We can filter by having a specific type, and then an error matching having a refined type (after some function application).
For a given term, we get the product of its weight with an error factor.
In terms of combinators, we replace inclusion by a weighted inclusion, or a product of the inclusion with a conformal factor. We need a new basic differentiable function.

Blending feedbacks.

As in the case of matching types, suppose
- $X$ is a finite distribution.
- various feedbacks are based on atoms.
- we have weights for the various feedbacks.
We give more credit for feedback components attained by fewer terms.
This is achieved by sharing the feedback coming from one component, instead of just giving gradient
The terms representing types feedback is a special instance of this.

Blending Pruning and Backprop

2015-01-07T10:25:14+05:30

The problem

One way to evolve a system is the genetic algorithm style, where we consider the fitness of the final elements and prune based on this. This however has trivial credit assignment.
At the other extreme is a pure back-propogation. This has nice credit assignment, but done directly the support never changes.

Solution

Firstly, there must be an identity term in the evolution, so credit can go to objects just persisting.
Note that while the result of a big sum does not depend on terms that are zero, specifically $\mu(v)f$ with $\mu(v) = 0$, the gradient still depends on such terms. Specifically we can get a flow back to add weight to $v$
We decouple the big sum component, so that when we apply to an argument which is a distribution, we do not just take a set of functions depending on that distribution. We can instead take a specified support.
We take a big sum over all generated elements, to allow some of them to acquire weight, while computing the gradient.
We should periodically prune by weight, but only after flowing for long enough to allow for picking up weight.

Code

We consider differentiable functions that also depend on a subset of V, by taking linear combinations including bigsums that are the union of the support of a distribution with the given subset.
For propagation, we just take an empty subset of V.
Once we have the image, we take its (essential) support as the given subset of V. We recompute the differentiable function. Note that the value is unchanged.
We use the gradient of the new function to back-propagate.
This gives a single loop, which we repeat to evolve the system.
Purge after looping for enough time.

Using the identity

The gradient of the identity does not depend on any form of a priori support.
Hence the gradient of a persistence term, i.e., identity multiplied by a weigh, attributes weight to everything in the final distribution.
Note: This lets us avoid the above complications.

Differentiable Function Combinators

2015-01-02T15:24:10+05:30

Basic functions and Combinators (all implemented)

We build differentiable functions for our basic learning system using some basic ones and combinators:

compositions (in class)
sums and scalar products of differentiable functions. (done)
product of a real valued and a vector valued function - the subtlest case (done).
identity function (done).
projections on (V, W) (done)
inclusions to (V, W) (dome)
evaluation of a finite distribution at a point (done).
atomic distribution as a function of weight (done).
point-wise multiplication of a finite distribution by a given function (done).
sum of a set of functions, with even the set depending on argument (done).
- this can be interpreted as the sum of a fixed set of functions, but with all but finitely many zero.
repeated squaring $k$ times, with $k=0$ and $k<0$ cases (done).
recursive definitions for families indexed by integers - generic given zero. Done (for vector spaces).

Derived from these

(optional) moves
(optional) pairings
linear combinations.

Convenience code

Have implicit conversion from
- a type T implicitly depending on a linear structure on T to have methods ++ and *:
- DiffbleFunction[V, V] to a class with a multiplication method **:

To Do:

Construct differentiable functions for Finite Distributions - atom, evaluate and point-wise product.

Gradient of Product

2015-01-02T10:38:10+05:30

A subtle differentiable function is the scalar product,

$f: (\alpha, v) \mapsto \alpha v$

for a vector space $V$.

Derivative

By Liebniz rule, the total derivative is

$Df(\alpha, v)(\alpha', v') = \alpha v' + \alpha' v$

Gradient

Note that depends on the vector space structure on $(\mathbb{R}, V)$. For a vector $w$ in $V$, note that we can write

$<Df(\alpha, v)(\alpha', v'), w> = \alpha' <v, w> + <v', \alpha w> = <(\alpha', w'), (<v, w>, \alpha w)$

By definition of gradient (as adjoint derivative), we have

$\Delta f (\alpha, v)(w) = (<v, w>, \alpha w)$

Code

We need an additional implicit structure with combinators:

case class InnerProduct[V](dot: (V, V) => Double)

def vdot[V](implicit ip: InnerProduct[V]) = ip.dot

Linear Structure, Differentiable Functions (Implicit) Combinators

2014-12-31T08:52:22+05:30

The best way to build differentiable functions for the typical learning system in function-finder is using combinators. At present there is some ad hoc version of this. The better way is to use linear structures systematically.

Linear structures

case class LinearStructure[A](sum: (A, A) => A, mult : (Double, A) => A){
  def diff(frm: A, remove: A) = sum(frm, mult(-1.0, remove))
}

Linear Spaces can be built with

Real numbers - Double
Finite Distributions have ++ and *
Pairs of linear spaces
Differentiable functions between linear spaces
(Eventually) Maps with values in Linear spaces - for representation learning.

case class LinearStructure[A](sum: (A, A) => A, mult : (Double, A) => A){
  def diff(frm: A, remove: A) = sum(frm, mult(-1.0, remove))
}

implicit val RealsAsLinearStructure = LinearStructure[Double]((_+_), (_*_))

implicit def VectorPairs[A, B](implicit lsa: LinearStructure[A], lsb: LinearStructure[B]): LinearStructure[(A, B)] = {
  def sumpair(fst: (A, B), scnd: (A, B)) =(lsa.sum(fst._1, scnd._1), lsb.sum(fst._2, scnd._2))

  def multpair(sc: Double, vect: (A, B)) = (lsa.mult(sc, vect._1), lsb.mult(sc, vect._2))

  LinearStructure(sumpair, multpair)
}

implicit def FiniteDistVec[T] = LinearStructure[FiniteDistribution[T]](_++_, (w, d) => d * w)

To Do:

Add a field for the zero vector (done).
Create a linear structure for differentiable functions(done).
Have functions that implicitly use linear structures(done).

More vector structures

In addition, we need inner products for computing certain gradients as well as totals for normalization. These are built for

Finite distributions
pairs
Real numbers
(Eventually) Maps with co-domain having inner products and totals.

Differentiable functions

trait DiffbleFunction[A, B] extends (A => B){self =>
  def grad(a: A) : B => A

  /**
  * Composition f *: g is f(g(_))
  */
  def *:[C](that: DiffbleFunction[B, C]) = andThen(that)

  def andThen[C](that: => DiffbleFunction[B, C]): DiffbleFunction[A, C] = DiffbleFunction((a: A) => that(this(a)))(
    (a: A) =>
    (c: C) =>
    grad(a)(that.grad(this(a))(c)))
  }



  object DiffbleFunction{
    def apply[A, B](f: => A => B)(grd: => A => (B => A)) = new DiffbleFunction[A, B]{
      def apply(a: A) = f(a)

      def grad(a: A) = grd(a)
    }

  }

Differentiable functions are built from

Composing and iterating differentiable functions.
(Partial) Moves on X giving FD(X) -> FD(X)
(Partial) Combinations on X giving FD(X) -> FD(X)
Co-ordinate functions on FD(X).
Inclusion of a basis vector.
Projections and Inclusions for pairs.
Scalar product of differentiable functions, $x\mapsto f(x) * g(x)$ with $f: V \to \mathbb{R}$ and $g: V\to W$.
The sum of a set of differentiable functions X \to Y, with the set of functions depending on $x : X$.

From the above, we should be able to build.

Islands: Recursive definitions in terms of the function itself - more precisely a sequence of functions at different depths (in a more rational form).
Multiple islands: A recurrence corresponding to a lot of island terms.

To Do:

Clean up the present ad hoc constructors replacing them by combinators.

Islands

After adding an island, we get a family of differentiable functions $f_d$ indexed by depth.
This can be simply viewed as a recursive definition.
Given a function $g$, we get a function at depth $d$ by iterating $2^d$ times by repeated squaring.
Determined by:
- An initial family of differentiable functions $g_d$, corresponding to iterating $2^d$ times.
- A transformation on differentiable functions $f \mapsto L(f)$.
Recurrence relation:
- $f_d = g_d + L(f_{d-1})$, for $d \geq 0$, with $f_{-1} = 0$.
- Here $L(f) = i \circ f \circ j$.
- When $d=0$, the recurrence relation just gives just gives $id = id$.
- Then $d=1$, we get $f_1 = g_1 + L(id)$.

Multiple Islands

We have a set of islands associated to a collection of objects, with a differentiable function associated to each.
We can assume the set to be larger, with functions for the rest being zero. So the set is really an a priori bound.
We get a recurence relation:
$f_d(\cdot) = g_d(\cdot) + \sum_v i_v(\cdot) * L_v(f_{d-1}(\cdot)).$
Here $i_v$ is evaluation at $v$, for a pair of finite distributions.

Quasi-literate Programming

2014-12-31T08:39:41+05:30

This blog will now have various comments, often technical, concerning the issues I face while working, mainly programming for Automated theorem proving.

Moving Blogs

2014-05-21T10:09:44+05:30

The blog “Logic, Types and Spaces” has moved to:

http://siddhartha-gadgil.github.io/LogicTypesSpaces/

Logic From Types

2014-03-26T06:46:59+05:30

At the core of homotopy type theory (and its predecessors) is the idea of propostions as types. Namely, we interpret logical propositions - statements that are either true or false, as types, with a proposition being true if and only if the corresponding type has an element (i.e., there is an object with that type). The Curry-Howard correspondence lets us embed all of logic into type theory in the manner.

True and False

We begin by representing the two simplest propositions: true - always true, and false.

True and False types

data True : Set where
  qed : True


data False : Set where

The $True$ type has just one constructor, giving an object with type $True$. On the other hand, the $False$ type has no constructors, so there are no objects of type $False$.

There are various ways of building propositions from other propositions. We see how these translate to constructions of types.

Logical implies

If $P$ and $Q$ are propositions, which we identify with their corresponding types. We interpret the proposition $P \implies Q$ as the function type $P \to Q$.

Some deductions

Modus Poens is the rule of deduction (due to Aristotle) that says that if the proposition $P$ is true, and $P$ implies $Q$, then $Q$ is true. We can prove this in the types interpretation. Namely, Modus Poens transaltes to the statement that if we have an objects of type $P$ and $P \to Q$, then we have an object of type $Q$. We get an object of type $Q$ by function application.

Modus Poens

moduspoens : {P : Set} → {Q : Set} → P → (P → Q) → Q
moduspoens p imp = imp p

Next, we see my favourite method of proof - vacuous implication. This says that a false statement implies everything, i.e., for any proposition $P$, we have $False \implies P$, which in type theory says $False\to P$ has objects.

As the $False$ type has no cases at all, a function is defined on $False$ by using an absurd pattern, which just says that there are no cases, so no definition is needed.

Vacuous implication

vacuous : {A : Set} → False → A
vacuous ()

Even and Odd numbers

Next, we define a type family Even n which is non-empty if and only if n is even. To do this, we see that a number is even if and only if it is even as a consequence of the rules

$0$ is even.
If $n$ is even, so is $n + 2$.
Thus, we can define the inductive type family:

Even type family

data Even : ℕ → Set where
  zeroeven : Even zero
  +2even : {n : ℕ} → Even n → Even (succ (succ n))

We can prove that $2$ is even by applying the second rule, with $n=0$, to the first rule (telling us that $0$ is even).

2 is Even

2even : Even 2
2even = +2even zeroeven

We next show that $1$ is odd. This means that we have to show that the type $Even 1$ is empty. While rules let us construct objects, and verify their types, there is no rule that tells us directly that a type is empty.

However, there is a nice way of capturing that a type $A$ is empty - if there is a function from $A$ to the empty type $False$, then $A$ must be empty - there is nowhere for an object of type $A$ to be mapped.

Indeed, what we prove is that there is a function from $Even\ 1$ to $False$ ; we define this using an absurd pattern.

1 is odd

1odd : Even 1 → False
1odd ()

The identity type

One of the most fundamental concepts in homotopy type theory is the identity type family, representing equality between objects with a given type. This is an inductive type family, generated by the reflexivity constructor giving an equality between an object and itself.

The identity type

data _==_ {A : Set} : A → A → Set where
  refl : (a : A) → a == a

Note that while this is an inductive type family, for a fixed object a the type $a==a$ is not an inductive type defined by $refl(a)$, i.e., we cannot define (dependent) functions on this type but just defining them on the reflexivity constructor. This is a subtle point, which will become clearer as we look at the topological interpretation. We shall study the identity type extensively.

For now, let us show some basic properties of the identity type. All these are proved by constructing objects by pattern matching (recall that these are dependent functions, so we are formally constructing them by induction, not recursion).

Firstly, if $f$ is a function and $a==b$ (i.e., there is an object of type $a==b$), then $f(a)==f(b)$ (again, we mean there is an object of this type).

Equality under function application

transfer : {A : Set} → {B : Set} → {x y : A} → (f : A → B) → x == y → f(x) == f(y)
transfer f (refl a) = refl (f a)

Further, we see that equality (given by the identity type) is symmetric and transitive.

Symmetry of the equality

symm : {A : Set} → {x y : A} → x == y → y == x
symm (refl a) = refl a

Transitivity of equality

_transEq_ : {A : Set} → {x y z : A} → x == y → y == z → x == z
(refl a) transEq (refl .a) = refl a

There is a new Agda feature we have used in the proof of transitivity: the dot notation. Notice that we have a term .a - this says that we can deduce, from the types, that $a$ is the only possibility at its position in the pattern.

For all

Suppose we have a type $A$ and a family of types $P(a)$ (which we regard as propositions), with a type associated to each object $a$ of type $A$. Then all the types $P(a)$ have objects (i.e., all corresponding propositions are true) if and only if there is a dependent function mapping each object $a$ of type $A$ to an object of type $P(a)$. Thus, the logical statement

$\forall a \in A\ P(a)$

translates to the product type

$\prod\limits_{a : A} P(a).$

In Agda, we represent the product type as $(a : A) \to P(a)$

A proof by induction

We can now prove a more substantial result. Namely, suppose we have a function $f : \Bbb{N}\to \Bbb{N}$ , and suppose for all $n\in \Bbb{N}$ we have $f(n) = f(n+1)$, then we show that, for all $n$, $f(n)=f(0)$.

Such a statement is normally proved by induction (indeed any proof must reduce to induction). We see that our pattern matching rules for constructing dependent functions suffice to prove this - this is why the dependent function that gives them is called the induction function.

f(n)= f(n+1) for all n implies f is constant

constthm : (f : ℕ → ℕ) → ((m : ℕ) → (f (succ m)) == (f m)) → (n : ℕ) → (f n) == (f 0)
constthm f _ 0 = refl (f 0)
constthm f adjEq (succ n) = (adjEq n) transEq (constthm f adjEq n)

Let us look at the statement and proof more carefully. Firstly, note that the statement is of the form

$(f : \Bbb{N} \to \Bbb{N}) \to P(f) \to Q(f).$

As we have seen, the first term is a translation of $\forall f : \Bbb{N} \to \Bbb{N}$, so the statement says that for all functions $f : \Bbb{N} \to \Bbb{N}$ (or any other codomain with obvious modifications), we have $P(f) \to Q(f)$.

This in turn is a translation of $P(f) \implies Q(f)$. So we can regard $P(f)$ as the hypothesis, for a fixed function $f$, and $Q(f)$ as the desired conclusion.

The hypothesis P(f) is the statement

$(m : \Bbb{N}) \to f(m+1) == f(m)$

which is just the statement that for all $m$, $f(m+1)=f(m)$. Finally, the conclusion $Q(f)$ just says that $f(n)=f(o)$ for all $n$.

We now look at the proof. The two constructors correspond to the base case and the induction step. Here the base case is just $f(0)=f(0)$, which follows from reflexivity of equality.

The induction step is more complicated. We prove the result for $n+1$ assuming the result for $n$. The main hypothesis $\forall m, f(m+1) = f(m)$, is, by pattern matching, $adjEq$. The right hand side, which is the proof in the case of $n+1$, is obtained from:

The hypothesis for $n$, giving $f(n+1) = f(n).$
The induction hypothesis, which is the theorem we are proving applied to $n$, giving $f(n)=0.$
Transitivity of equality.
The proof is obtained by applying the function corresponding to the transitivity of equality to the two objects corresponding to the other ingredients above.

This proof is remarkable in many ways. First of all, note that this is no longer or more complicated than an informal proof. Further, note that we did not have to invoke the usual induction axiom schema, but instead just used the rules for constructing objects. Most significantly, as most of our types are inductive type (or type families), we get recursive definitions and inductive proofs in all these cases.

Indeed, using recursive definitions for inductive types we get all so called universal properties. Furthermore, traditionally universal properties require a separate uniqueness statement. But recursion-induction is powerful enough to even give the uniqueness statements for universal properties. This means a great deal of mathematical sophistication (universal algebra, various aspects of category theory) are encapsulated in these simple functions.

More types for propositions

We now see the types corresponding to the other ways of combining propositions : and, or and there exists.

Firstly, if $A$ and $B$ are types corresponding to propositions, then there are objects with each of these types if and only if there is a pair $(a, b)$ of the pair type $A \times B$, which we define as follows.

Pair type

data _×_ (A B : Set) : Set where
  [_,_] : A → B → A × B

We have to use square brackets as parenthesis have a special meaning in Agda. Observe that there is a single constructor that takes as input an object in $A$ and an object in $B$. We regard the result as the pair.

Next, suppose $A$ and $B$ are types corresponding to propositions and we wish to construct a type corresponding to $A$ or $B$, then we require a type whose elements are elements of $A$ and elements of $B$, or more accurately the images of such elements under constructors. This is the direct sum type.

Direct Sum type

data _⊕_ (A B : Set) : Set where
  i₁ : A → A ⊕ B
  i₂ : B → A ⊕ B

Finally, if we are given a collection of types $B(a)$ for objects $a$ of type $A$, we construct the type corresponding to at least one of these types having an element, i.e., a there exists type.

Sigma type

data Σ(A : Set) (B : A → Set) : Set where
  ι : (a : A) → B a → Σ A B

Notice that the constructor for this type has as input an element $a$ and an element of type $B(a)$. Such elements exist if and only if some $B(a)$ is non-empty, i.e., the corresponding proposition is true.

As we see, we can express all the usual mathematical statements using types built up using our basic constructions: inductive types, functions and dependent functions. We have also seen that the basic rules for constructing objects are powerful rules of deduction. However, there are some things they cannot deduce, for instance the statement (called the axiom of extensionality) that if $f, g: A\to B$ are function with $f(a)=g(a)$ for all $a \in A$, then $f=g$. Hence, we have to introduce this as a postulate - we just postulate that there is an object (which we give a name) of a given type.

Axiom of Extensionality

postulate
  extensionality : {A B : Set} → (f g : A → B) → ((x : A) → (f x) == (g x)) → f == g

We can similarly introduce axioms specific to a domain, say Euclidean geometry, by postulating them in a module.

Fun With Lists

2014-03-22T07:39:57+05:30

We shall now return to lists. We have already mapped and flatmapped lists. We shall see how to filter, fold and find, letting us do some actual programming, i.e., with answers that are numbers. We shall also see if-expressions and option types.

Filtering and if-then-else

Let us start with a function checking if a list contains an element with a certain property.

checking if a list contains an element with a given property

_contains_ : {A : Set} → List A → (A → Bool) → Bool
[] contains _ = false
(x :: xs) contains p = (p x) || (xs contains p)

Before turning to filtering, we define an if expression, which gives one of two values depending on whether a Boolean term is $true$ or $false$. Note that this is not an if statement, doing something according to a condition, but a function that returns a value depending on the condition.

if expression

if_then_else : {A : Set} → Bool → A → A  → A
if true then x else _ = x
if false then _ else y = y

With this in hand, we define a function filtering elements of a list by a given property. This is defined inductively, using the if expression to decide whether to prepend the first element.

filtering a list

_filter_ : {A : Set} → List A → (A → Bool) → List A
[] filter _ = []
(x :: xs) filter p = if (p x) then (x :: (xs filter p)) else (xs filter p)

Folding

So far we have defined many objects and types, but not actually computed anything concrete. To do this, we shall use a very useful function on lists, folding. This is a function that takes as input:

A list of type $A$, say $[a0, a1, \dots, a_n]$.
An element of type $B$, $b : B$.
A binary operation that lets us multiply (or add) an element of $A$ to (the left of) an element of $B$, $op : A \to B \to B$.

The fold is obtained by starting with the given element in B, and successively multiplying on the left by elements of the list, starting with the rightmost. This stops when the list is empty, given an element in B. Thus, if we omit parenthesis (assuming associativity of the operation), and if * denotes the operation, then folding is the function

$fold([a_0, a_1, \dots, a_n], b) = a_0*a_1 * \dots * a_n *b$

As usual, we define the fold function recursively.

Folding a list

fold_by_from_ : {A : Set} → {B : Set} → List A → (A → B → B) → B → B
fold [] by _ from b = b
fold (a :: as) by op from b = op a (fold as by op from b)

A program

Equipped with this, we now give a function computing the sums of squares of numbers from 1 to n. After importing natural numbers, we define (recursively) a function that gives the list of numbers from 1 to n. We then map this to get a list of squares. Finally, we fold this by the addition function to get the sum of squares.

Sum of squares

open import Nat

upto : ℕ → List ℕ
upto zero = []
upto (succ n) = (upto n) ++ ((succ n) :: [])

listsqs : ℕ → List ℕ
listsqs n = upto n map (λ x → (x * x))

sumsqs : ℕ → ℕ
sumsqs n = fold (listsqs n) by _+_ from zero

In the Agda mode of emacs, we can evaluate , for example, the sum of squares up to $20$ to get $2870$. This illustrates that what we are doing does include general purpose programming - indeed it implements an older model of computing, due to Church, which is equivalent to what may be more familiar models.

Finding and option types.

We next define a function that finds an element in a list (of type $A$) having a given property. However the list may have no such element, so we cannot always return an object in $A$. Simply giving an error is meaningless mathematically and a bad idea for a program.

To deal with such a situation, we use the type $Option A$, whose objects are $Some a$ for objects $a$ of type $A$, and an object $None$ representing no result.

Option types

data Option (A : Set) : Set where
  Some : A → Option A
  None : Option A

We can now define a find function, returning an option type, containing an object in the list with the given property if there is such an object.

finding in a list

_find_ : {A : Set} → List A → (A → Bool) → Option A
[] find _ = None
(x :: xs) find p = if (p x) then (Some x) else (xs find p)

Thus, using option types, we can represent partially defined functions - those defined on some objects of a type $A$. One often encounters partial functions - finding in a list, taking square-roots, division (avoiding division by $0$), etc. We can lift such a partially defined function to one taking values in an option type. We shall identify partial functions with their lifts.

Suppose we have a partially defined function $f: A\to B$ (not defined on some values of $A$ in general) and a function $g : B \to C$ (defined everywhere). Then it is natural to define the composition of $f$ and $g$ as a partial function defined wherever $f$ is defined. Passing to lifts, this is accomplished by the map function on option types.

map on an option type

_mapOption_ : {A : Set} → {B : Set} → Option A → (A → B) → Option B
None mapOption _ = None
(Some a) mapOption f = Some (f a)

Even better, if both $f: A\to B$ and $g : B\to C$ are both partially defined, we can compose them to get a value in $C$ exactly for elements $a : A$ for which $f(a)$ is defined and lies in the domain of g. We do this by flatmapping (after passing to lifts once more).

flatmap on option types

_flatMapOption_ : {A : Set} → {B : Set} → Option A → (A →  Option B) → Option B
None flatMapOption _ = None
(Some a) flatMapOption f = f a

There are some obvious common features between lists and option types.

They are both parametrized by a single type A.
The type A itself can be embedded in these: map $a$ to the list $a :: []$ in $List\ A$ and the object $Some\ a$ in $Option\ A$, respectively.
We can map elements of these types, given a function with domain $A$.
Given a function with domain $A$ and codomain of the form $List\ B$ or $Option\ B$ as appropriate, we can flatmap to get an object of type $List\ C$ or $Option\ C$, respectively.

Types that have these properties (and some consistency relations between the embeddings, map and flatmap) are called Monadic. There are many other useful such types, for example $Future A$ is used to represent an object of type $A$ that will be available in the future, for instance after a long computation. We do not have to wait for the computation to be over before saying what to do with the result when available, even if this involves another long computation which will return a result as a Future.

Dependent Function Types: Sections of a Bundle

2014-03-21T17:56:43+05:30

A function $f$ on a domain $A$ when applied to an elements $a$ of type $A$ gives a value $f(a)$. Further, a function is determined by the values it gives, in the sense that if $f$, $g$ are functions with domain $A$ so that

$\forall x\in A, f(x) = g(x)$

then $f=g.$

Dependent functions generalize functions, with the above properties continuing to hold. What we no longer require is that $f(a)$ has a fixed type independent of $a$, namely the codomain B. Instead we have a family of codomains $B(a)$, so that $f(a)$ has type $B(a)$.

Such objects are common in mathematics (and physics). For example, the velocity of water flowing on a sphere gives a vector field on a sphere. At a point $x$ on the sphere, the value of the vetor field $V$ lies in the tangent space at the point, i.e.,

$V(x) \in T_x S^2.$

Hence it is best to view vector fields as dependent functions. In mathematics, the codomains are generally called fibers, which together form a fiber bundle, and dependent functions are called sections of this bundle.

We can (and often do) regard a dependent function as an ordinary function with codomain the union of the various codomains. But, besides losing information, the function we get is not natural, in the sense that it does not respect the underlying symmetries.

We now turn to some simple examples and code. First we consider type families, which give the collection of codomains for dependent functions. The typical example is vectors of length $n$ of elements of a type $A$. Formally, a type family is just a function with codomain a universe, so the values the function takes are types.

The Type family of vectors of length n

data Vec (A : Set) : ℕ → Set where
  []   : Vec A zero
  _::_ : {n : ℕ} → A → Vec A n → Vec A (succ n)

This gives a family of types parametrized by A and indexed by natural numbers. The difference between parameters and indices is a bit subtle but crucial. Observe that the Agda syntax treats them quite differently.

Inductive types and inductive type families

We defined Booleans and natural numbers using the data statement, and defined functions on them by pattern matching. More conceptually, these are inductive types, and functions defined on them are defined by applying the recursion function. For instance, in the case of Booleans, the recursion function takes as input a type $A$ and two objects with that type (the values of $true$ and $false$) and gives a function from Booleans to $A$.

In the case of lists, for each type $A$, we obtain a corresponding inductive type. Thus we have a family of inductive types, parametrized by the type $A$.

In the case of vectors too, the type $A$ acts as a parameter. Assume that the type $A$ is fixed, so vectors are now a family of types indexed by natural numbers.

However, the vectors of a fixed size (say $7$) do not form an inductive type - we cannot define a function recursively on vectors of length $7$ alone. In this case, this is evident from the definition, as the constructor giving vectors of size $7$ uses vectors of size $6$. So a recursive definition must also involve vectors of size $6$, hence of $5$ etc.

We can, however, recursively define functions on vectors of all sizes, i.e., of all values of the index. For examples, here is the function that appends (adds to the end) an element to a vector.

Appending to a vector

_:+_ : {A : Set} → {n : ℕ} → A → Vec A n → Vec A (succ n)
a :+ [] = a :: []
a :+ (x :: xs) = x :: (a :+ xs)

Thus, vectors form an inductive type family indexed by natural numbers (and parametrized by A). As we remarked, the type for a given index is not an inductive type. Note that even in cases where we can meaningfully write down a recursion rule for the type at a fixed index, such a recursion rule does not in general give a function on that type.

Remark: From the point of view of programming languages, there is another sense in which indexing by natural numbers is different from parametrizing by types - the types we construct depend on objects, not just other types. Modern languages usually allow types to depend on other types (sometimes called generics), but most do not allow dependence on objects.

A dependent function

We shall now construct a dependent function countdown that maps a natural number $n$ to the list consisting of natural numbers from $n$ down to $0$. Thus the type of $countdown(n)$ is vectors in natural numbers of length $n+1$.

countdown : a dependent function

countdown : (n : ℕ) → Vec ℕ (succ n)
countdown zero = zero :: []
countdown (succ m) = (succ m) :: (countdown m)

The definition in terms of pattern matching is similar to recursive definitions of functions. In terms of homotopy type theory, dependent functions on inductive types are constructed by applying a dependent function called the induction function to the data.

The type of a dependent function is called the product type corresponding to the family of types with base the domain. For instance, the type of countdown is

$\prod\limits_{n : \Bbb{N}} Vec (\Bbb{N}) (n+1).$

Except for universes (which we will keep in the background as far as possible), we have now seen all the type constructions - inductive types, functions and dependent functions.

Type checking with dependent types

A principal use of types in programming is to avoid writing meaningless expressions, by ensuring that such expressions violate the rules for constructing objects and types. Dependent types are even better than this. For instance, consider component-wise addition of vectors, or more generally component-wise application of a binary operation. This makes sense only when both vectors have the same length. Using dependent functions and types, we can define the function in such a way that it is defined only on pairs of vectors with the same length.

Componentwise operation on vectors

zipop : {A : Set} → {n : ℕ} → Vec A n → Vec A n → (A → A → A) → Vec A n
zipop  [] [] _ = []
zipop  (x :: xs) (y :: ys) op = (op x y) :: (zipop  xs ys op)

Note that we could have used lists in place of vectors, but we would then have to give a definition that can lead to errors at runtime.

Inductive Types: Natural Numbers, Lists

2014-03-19T16:03:29+05:30

Booleans were a finite type, where we specified all constant objects of the type. More generally, we can construct inductive types by introducing constructors - functions mapping to the type. For instance, we can define the natural numbers (which in logic generally start with $0$).

Natural Numbers: Inductive Definition

data ℕ : Set where
  zero : ℕ
  succ : ℕ → ℕ

Here $zero$ is $0$ and $succ$ is the function taking any natural number to its successor. Any natural number can be built from these, and different expressions give different numbers. However, as with the case of Booleans, we do not have any explicit rules giving such statements - indeed, it would be quite painful to formulate such rules, as we have to define expressions, equality of expressions etc. Instead, the rules for defining functions from natural numbers implicitly contain such statements. We define functions on natural numbers by pattern matching, but this time the patterns can involve the constructors.

Addition of natural numbers

_+_ : ℕ → ℕ → ℕ
zero + n = n
(succ m) + n = succ (m + n)

The left hand sides pattern match on the constructors for natural numbers. Notice that this is a recursive definition, the right hand side in the second pattern for defining addition also involves addition. This is fine as the definition lets us rewrite a sum in terms of sums of simpler terms, and such a process eventually terminates. However, we can write nonsensical definitions in general.

Recursion resulting in infinite loops

forever : ℕ → ℕ
forever zero = zero
forever (succ n) = forever (succ (succ n))

When we try to compute $forever(1)$ (where $1 = succ (zero)$), we get that $forever(1) = forever(2)$, and this continues forever. Fortunately, Agda will not let us make such definitions. This does come at a price though - sometimes it takes a lot of work to convince Agda that functions terminate. There are deeper, conceptual limitations to any system ensuring termination, as discovered by Turing, which we return to in the future.

In homotopy type theory, the rules for such recursive definitions are nicely formulated so we cannot making nonsensical definitions. We shall see how this is done much later. We now define a couple of other recursive functions.

Multiplication of natural numbers

_*_ : ℕ → ℕ → ℕ
zero * n = zero
(succ m) * n = n + (m * n)

Factorials

factorial : ℕ → ℕ
factorial zero = succ zero
factorial (succ n) = (succ n) * (factorial n)

Our constructors are unfortunately only as efficient as tally marks in dealing with actual numbers. Agda has a built in type that lets us deal with natural numbers the usual way. We can invoke it as below.

Agda: Builtin natural numbers

{-# BUILTIN NATURAL ℕ #-}
{-# BUILTIN ZERO zero #-}
{-# BUILTIN SUC succ #-}

Lists : A parametrized type

We shall now define the type of Lists of objects, each of which is of a given type A. Thus, we are defining not just one type, but a family of types parametrized by the type A.

Lists

data List (A : Set) : Set where
  [] : List A
  _::_ : A → List A → List A

The constructor $[]$ is just a constant, giving us the empty list. The constructor $ _ :: _$ gives a list from an element $a$ in $A$ and a list $l$ of elements in $A$ by adding $a$ to the beginning of the list $l$. For instance (after importing the natural numbers constructed earlier) we can describe the list [1, 2, 3] as

The list [1, 2, 3]

open import Nat

onetwothree : List ℕ
onetwothree = 1 :: (2 :: (3 :: []))

We can make things a little cleaner by specifying that $_::_$ is right associative. We shall discuss this later. Let us now define the length of a list. Here is our first definition.

Length of a list: First attempt

length₀ : (A : Set) → List A → ℕ
length₀ _ [] = zero
length₀ A (a :: l) = succ (length₀ A l)

We have given a recursive definition. We defined a function of the type $A$ as well as the list, as we needed $A$ in the right hand side of the second pattern. But $A$ can be inferred in this pattern - it is the type of the element $a$. So we can declare $A$ to be an optional argument (by putting it in braces), and let Agda infer its value. This gives us a cleaner definition.

Length of a list

length : {A : Set} → List A → ℕ
length [] = zero
length (a :: l) = succ (length l)

Next, we define some functions recursively. The first is concatentation, which combines two lists by giving the entries of the first followed by those of the second.

Concatenation of Lists

_++_ : {A : Set} → List A → List A → List A
[] ++ l = l
(a :: xs) ++ l = a :: (xs ++ l)

We next define the function that reverses a list.

Reversing a list

reverse : {A : Set} → List A → List A
reverse [] = []
reverse (a :: l) = (reverse l) ++ (a :: [])

We now turn to some more interesting functions. Given a list of objects of type $A$ and a function $f:A \to B$, we can apply $f$ to each entry of the list to get a list of elements of $B$. This is usually called the $map$ function.

map function on lists

_map_ : {A B : Set} → List A → (A → B) → List B
[] map _ = []
(a :: xs) map f = (f a) :: (xs map f)

We can do more - if we have a function $f: A \to List B$, then we can map a list $l$ of elements of $A$ to a list of elements of $B$ - each element of $l$ maps to a list of elements of $B$, and we get a list by concatenating these lists together.

flatmap on lists

_flatMap_ : {A B : Set} → List A → (A → List B) → List B
[] flatMap _ = []
(a :: xs) flatMap f = (f a) ++ (xs flatMap f)

In case you did not notice, we have been programming in a functional language. To those used to imperative languages (say C), this may not look like programming, but programming in functional languages is essentially building functions, collections and structures. Indeed anyone who has programmed in (for example) scala for a while hungers for a flatmap function. We will eventually see some of the other main ingredients of collections in functional languages - finding in, filtering and folding lists. But before that we shall take a first look at dependent types.

Foundations: Objects, Types, Rules

2014-03-18T09:25:57+05:30

Quoth Andrej Bauer:

Mathematicians like to imagine that their papers could in principle be formalized in set theory. This gives them a feeling of security, not unlike the one experienced by a devout man entering a venerable cathedral. It is a form of faith professed by logicians. Homotopy Type Theory is an alternative foundation to set theory.

In this series of posts, we explore the background behind, and the basics of, homotopy type theory. We begin by laying out the basic rules governing such foundations.

We regard almost everything in mathematics as an object - not just numbers or groups, but even theorems, proofs, axioms, constructions etc. This lets us consider functions acting on, and collections of, just about everything. Strictly speaking objects should be called terms, based on standard terminology of logic, but I shall use the word object as it better represents how I feel we should think about them.

Every object has a type, which is generally (but not always) unique. Types themselves are objects, so in particular have types. A type whose objects are themselves types is called a universe. Every set is a type, but not all types are sets.

Our foundations are governed by rules telling us the ways in which we can construct objects, which we will see as we go on. Further, we have rules letting us make two types of judgements:

that an object $x$ has a type $T$ (denoted $x : T$).
that two objects $x$ and $y$ are equal by definition.

Objects may be equal even if they are not so by definition. For instance, for a real number x, we have the equality

$sin^2(x) + cos^2(x) =1$

which it would be silly to say is true by definition. Rather, it is a proposition that it is true, and we say the two sides of these are propositionally equal (but not definitionally or judgementally equal). Similar considerations also apply to objects having specified types.

In addition to constructing objects by our rules, we can introduce objects and make some judgements related to them as axioms - this may of course introduce inconsistencies.

The Boolean type

Let us begin constructing objects. We shall do this in the language/formal proof system Agda, but I will try to be self-contained.

We start with a built in universe called Set - objects having type Set are sets. We shall use the data statement to construct the type of Booleans - which is a basic logical type used to represent whether a statement is true or false.

The Boolean type

data Bool : Set where
  true   : Bool
  false  : Bool

The above code gives the construction of three objects: a type called Bool and two objects true and false that have type Bool. This is a simple rule for constructing a type - give it a name and list objects in it.

Note that we do not have an explicit rule saying that the type Bool has no other objects, or even that true and false are different objects. Indeed we will not introduce any such rule, but rules for constructing objects will implicitly tell us that Bool has exactly two distinct objects, true and false.

A function

A central role in type theory is played by function types - one may even say that the principal virtue of type theory is treating functions with the respect they deserve (as any good programming language does), instead of trying to wrap them up as subsets of cartesian products that happen to satisfy some properties. We now define our first function, and our first function type.

Given types A and B, functions from A to B give a type

$A \to B.$

We construct the function $not$ from Booleans to Booleans, which is the logical negation.

Logical Not function

not : Bool -> Bool
not true = false
not false = true

We have defined the function not by giving the values on each Boolean constant, and these values are constants. This is the simplest form of pattern matching.

Note that being able to construct such a function means $true$ and $false$ are distinct objects as promised.

Currying functions

Let us now turn to functions of two variables. By a trick due to the logician Haskell Curry, we do not have to introduce a new type. Instead, observe that if we have a function $f(x,y)$ of two variables (of types $A$ and $B$, taking values of type $C$), then for fixed $x$ we get a function of $y$ only

$f(x , _ ) = y \mapsto f(x,y)$

so $f(x, _ )$ has the type of functions from $B$ to $C$. Now viewing $f(x, _)$ as a function of $x$, we get the curried form of $f$,

$x \mapsto (y \mapsto f(x,y))$

which has type

$A \to B \to C = A \to (B \to C).$

We define the logical and (in its curried form). A convenient Agda feature is that we can define functions so that their arguments can appear in the beginning, end, middle or any other combination, just by putting underscores as placeholders.

Logical And function

_&_ : Bool → Bool → Bool
true & true = true
false & _ = false
true & false = false

The definition is a small extension of pattern matching we used in defining not. If some argument does not affect the result, we can use an underscore to represent all cases.

Function application

So far we have defined functions by defining their values on each constant as a constant. In general we use variables to represent arguments, and the values are built from existing objects. Here is an example.

A function using function application

notnot : Bool -> Bool
notnot x = not (not x)

We used a variable for the argument of the function, another generalization of pattern matching. The right hand side was built from previously constructed objects using function application, a function $f$ from $A$ to $B$ can be applied to an object with type $A$ to get an object $f(a)$ with type $B$.

Lambda: Expressions giving functions

Functions can be specified by describing the image of each argument. These are given by λ-expressions, following the notation of Church’s λ-calculus. We define such a function, $verytrue$, below, by equating it with an anonymous function that forms the right hand side.

A function using lambda

verytrue : Bool → Bool
verytrue = λ x → x & x

We can define (curried) functions of more than one variable using λ-expressions without having to explicitly nest λs.

Logical “exclusive or” : nested lambdas

_xor_ : Bool → Bool → Bool
_xor_ = λ x y → (x & (not y)) || ((not x) & y)

We have now seen the most basic constructions of types and objects. We shall next turn to the more powerful constructions - inductive types and dependent types.