Introduction and Set Theory

Course of Lectures on

First-order logic
Set Theory
$\lambda$-calculus
Type Theory

https://siddhartha-gadgil.github.io/TrainLogic/

Formal Methods

Mathematical proofs checked by computers.

The Pentium FDIV bug

Thomas Nicely, a professor of mathematics at Lynchburg College, had written code to enumerate primes, twin primes, prime triplets, and prime quadruplets.
Nicely noticed some inconsistencies in the calculations shortly after adding a Pentium system to his group of computers
After eliminating other factors he reported the issue to Intel.

This was a bug in the pentium chip due to missing entries in the lookup table used by the floating-point division circuitry.
This bug had escaped testing.
This caused Intel to take a $475M charge against earnings and the management said "don't ever let this happen again".
In 1996 they started proving properties of the Pentium processor FPU.

Then in 1997 a bug was discovered in the FIST instruction (that converts floating point numbers to integers) in the formally verified correct Pentium Pro FPU.
It was a protocol mismatch between two blocks not accounted for in the informal arguments.
So they went back to square one and during 1997-98 the verified the entire FPU against high-level specs so that mismatches like the FIST bug could no longer escape.
During 1997-99 the Pentium 4 processor was verified and there were no escapes.

TimSort

Tim Peters developed the Timsort hybrid sorting algorithm in 2002 - this is the standard sorting algorithm in Java, Android, Python etc.
It is a clever combination of ideas from merge sort and insertion sort, and designed to perform well on real world data.
Specifically, data is grouped into runs, sequences that are already ordered, using insertion sort if needed. Runs are merged as in merge sort.
Run lengths are stored on a stack, which should be small for performance reasons.

After having successfully verified Counting and Radix sort implementations in Java, the Dutch-German Envisage group turned to proving the correctness of TimSort.
After struggling to prove this, they realized they could not because the implementation was in fact wrong.
Specifically the size of the stack for run lengths was estimated based on a (growth) property of run lengths, but this did not always hold.
They proved that a modified version of the property holds, and a corresponding algorithm does work.

Conclusions from TimSort

Formal methods are often classified as irrelevant and/or impracticable by practitioners. This is not true: the researchers found and fixed a bug in a piece of software that is used by billions of users every single day.
Even though the bug itself is unlikely to occur, it is easy to see how it could be used in an attack.

Using Formal methods

Formal methods are mathematical proofs checked by computers.
What are mathematical proofs?
How do we know that the program checking the proof is correct?
Why does everyone not do so already?

Why not always?

Formal proving takes work, so we have a trade-off between productivity and safety.
We have the same trade-off between dynamic and static typing; and often productivity wins.
Better languages and tooling can help in getting safety without losing too much productivity.

Who guards the guards?

Formal proof systems have a small trusted kernel.
All proofs are verified, directly or indirectly, by the kernel.
The trusted kernel itself should be checked independently and thoroughly several times.
For example, lean theorem prover has independent type checkers (i.e. kernels) in three languages.

Foundations: Logic, Sets, Types

The usual foundations of mathematics are Set Theory as a theory in First-order logic.
Computation can be founded on $\lambda$-calculus or two other equivalent models.
Type theory gives common foundations integrating proofs and computations.
All these are based on capabilities to perform basic operation such as pattern matching on strings; and agreement on the meanings of such operations.

Foundations

of

Mathematics (and Computation)

Decide whether the following are true or false.
- $2 < 3$
- $ 3 > 4$
- $3 + 12$
- $3 += 4++ $
We can only ask whether an expression is true or false provided:
- It is a well-formed (meaningful) expression.
- It represents a statement (not an object such as a number or set).

Layers of foundations

A language, given by rules of its grammar/syntax, that specifies well-formed expressions as well as types of these expressions.
Rules of deduction letting us conclude that a statement is true from other true statements.
Axioms: statements that are known to be true.
Rules are syntactic in nature, i.e., in terms of manipulating and pattern matching strings.
Rules can be computed quickly.

Generative grammars

The languages we consider are given by
- an initial vocabulary of expressions (with types),
- (introduction) rules to form new expressions from existing ones and assigning types to these,
- such that the rules depend only on the types of the expressions.
Well-formed expressions are exactly those that can be formed from the vocabulary using the introduction rules.
We can prove properties of such expressions by induction.

Natural and programming languages

For example in English
- The vocabulary has nouns, verbs, adjectives etc.
- A single noun forms a noun phrase, e.g. lecture.
- An adjective followed by a noun phrase forms a noun phrase, e.g. boring lecture.
Similarly, in most programming languages, if x and y are expressions with type Int, we can form the expression x + y of type Int.
We can also form the Boolean expression x < y.

Mathematical expressions and their types are determined by the syntax of a generative language.
In particular, it is straightforward to generate all expressions, and to check the validity and type of an expression.
Similarly valid deductions are those given by repeated application of syntactic rules.
For example, the rule Modus Ponens says that for formulas $P$ and $Q$, from $P$ and $P \implies Q$ we can deduce $Q$.
A specific class of such languages and rules of deduction constitute First-order logic, on which the usual foundations of mathematics are based.

Theories and Sets

A theory is a (first-order) language together with a collection of axioms, i.e., statements that are taken to be true.
A specific first-order language and set of axioms constitute Set Theory.
All of mathematics can be encoded in terms of Set Theory. However this is verbose and opaque, like Bytecode/assembly language .
Hence any practical formalization depends on using some combination of a proof assistant and more expressive foundations.

Verification in the real world

The real-world problem needs to be (correctly) modelled mathematically/logically.
The mathematical model needs to be (correctly) embedded in Set theory or other appropriate foundations.
By results of logic, statements that are deduced following the rules of syntax are true in the Set Theoretic model.

First-order Languages

Data for a language

First-order languages (e.g. Arithmetic, Sets) are specified by (possibly empty) lists of
- (names of) constants, e.g. $0$, $1$, $\phi$.
- (names of) functions together with their degree/arities, e.g., addition ($+$) with degree $2$.
- (names of) relations together with their degree/arities, e.g., less than ($<$) and belongs to ($\in$) with degree $2$.
- (names of) variables - these can be assumed to be independent of the language.

Expressions

We form two types of expressions:
- Terms: representing numbers, sets etc.
- Formulas: representing properties of numbers, sets, etc.
Both terms and formulas may depend on variables.

Terms

A variable $x$ is a term.
A constant $c$ is a term.
If $f$ is a function of degree $n$ and $t_1$, $t_2$, $\dots$, $t_n$ are terms then $f(t_1, t_2,\dots,t_n)$ is a term.

Formulas

If $t_1$ and $t_2$ are terms, then $t_1 = t_2$ is a formula.
If $p$ is a relation of degree $n$ and $t_1$, $t_2$, $\dots$, $t_n$ are terms then $p(t_1, t_2,\dots,t_n)$ is a term.
If $P$ and $Q$ are formulas then $P\wedge Q$, $P\vee Q$, $P\implies Q$, $P\iff Q$, and $\neg P$ are formulas.
If $P$ is a formula and $x$ is a variable, then $\forall x P$ and $\exists x P$ are formulas.
We recursively define free variables of terms and formulas. A formula is closed if it has no free variables.

Set Theory

Sets as collections

A Set $S$ is a collection of objects, with no other structure.
Thus, we have a relation $x \in S$, for any object $x$, corresponding to $x$ being a member of $S$.
Further, if $S$ and $T$ are sets with the same members they are equal, i.e. if $$\forall x\ x\in S \iff x\in T$$ then $S= T$ (Axiom of extension).

Language of Sets

We have a first-order language that describes sets with
- a single relation $\in$ (is a member of) of degree $2$.
- a single constant $\phi$ (the empty set).
For example, the axiom of extensionality can be described in this language by the formula $$\forall x\forall y (\forall a\ a\in x \iff a \in y) \implies x = y.$$
Here everything is a set, including members of sets.

What collections?

We need some way to construct sets.
A natural way to try to do this is to say sets are collections of objects that satisfy some property, e.g. $$\{p: \text{$p$ is a prime number} \}.$$
We can make this formal by, for example, defining properties to be closed formulas in the first-order language of sets (alternatively as closed formulas in some first-order language).

Russell's Paradox

Assume we had sets corresponding to all closed formulas in the first-order language of sets.
Consider the set $$ S = \{X : X \notin X\}.$$
Question: Is $S\in S$?
We see that
- if $S \notin S$ then $S \in S$,
- if $S\in S$ then $S \notin S$.

Sets from Axioms

The axiom of specification allows us to define subsets of a set by a property.
Thus, if we have previously defined the natural numbers $\mathbb{N}$, then we can define the set $$\{p \in\mathbb{N} : \text{$p$ is a prime number} \}.$$
Most axioms of set theory are statements giving us the existence of sets.

More axioms

In our setup, we need an axiom for the empty set: $\forall x (\lnot(x\in \phi))$
We have axioms giving the unions of pairs and of arbitrary collections of sets.
The axiom of infinity, says that there exists a set $S$ such that
- $\phi\in S$.
- $(x\in S) \implies ((x\cup \{x\}) \in S)$
There are also two technical axioms: axioms of regularity and replacement, completing ZF.

Axiom of Choice and transfinite recursion

The axiom of choice is an axiom independent of ZF - including this gives ZFC.
This says: let $(S_i)_{i\in I}$ be an indexed family of non-empty sets, where $I$ is the index set. Then there is an indexed family of elements $(x_i)_{i\in I}$ such that $x_i\in S_i$ for all $i\in I$.
In the presence of the axiom of choice, we can make constructions by transfinite recursion
For instance we get the trichotomy theorem: for sets $A$ and $B$, at least one of $|A|\leq |B|$ and $|B|\leq |A|$ holds.