Formal Methods
Mathematical proofs checked by computers.
The Pentium FDIV bug

Thomas Nicely, a professor of mathematics at Lynchburg College, had written code to enumerate primes,
twin primes, prime triplets, and prime quadruplets.

Nicely noticed some inconsistencies in the calculations shortly after adding a Pentium system to his group of computers

After eliminating other factors he reported the issue to Intel.
 This was a bug in the pentium chip due to missing entries in the lookup table used by the floatingpoint division circuitry.
 This bug had escaped testing.

This caused Intel to take a $475M charge against earnings and the management said "don't ever let this happen again".
 In 1996 they started proving properties of the Pentium processor FPU.
 Then in 1997 a bug was discovered in the FIST instruction (that converts floating point numbers to integers)
in the formally verified correct Pentium Pro FPU.
 It was a protocol mismatch between two blocks not accounted for in the informal arguments.
 So they went back to square one and during 199798 the verified the entire FPU against highlevel specs
so that mismatches like the FIST bug could no longer escape.
 During 199799 the Pentium 4 processor was verified and there were no escapes.
TimSort
 Tim Peters developed the Timsort hybrid sorting algorithm in 2002  this is the standard sorting algorithm in Java, Android, Python etc.
 It is a clever combination of ideas from merge sort and insertion sort, and designed to perform well on real world data.
 Specifically, data is grouped into runs, sequences that are already ordered, using insertion sort if needed.
Runs are merged as in merge sort.
 Run lengths are stored on a stack, which should be small for performance reasons.
 After having successfully verified Counting and Radix sort implementations in Java, the DutchGerman Envisage group
turned to proving the correctness of TimSort.
 After struggling to prove this, they realized they could not because the implementation was in fact wrong.
 Specifically the size of the stack for run lengths was estimated based on a (growth) property of run lengths, but this did not always hold.
 They proved that a modified version of the property holds, and a corresponding algorithm does work.
Conclusions from TimSort
 Formal methods are often classified as irrelevant and/or impracticable by practitioners.
This is not true: the researchers found and fixed a bug in a piece of software that is used by billions of users every single day.
 Even though the bug itself is unlikely to occur, it is easy to see how it could be used in an attack.
Using Formal methods
 Formal methods are mathematical proofs checked by computers.
 What are mathematical proofs?
 How do we know that the program checking the proof is correct?
 Why does everyone not do so already?
Why not always?
 Formal proving takes work, so we have a tradeoff between productivity and safety.
 We have the same tradeoff between dynamic and static typing; and often productivity wins.
 Better languages and tooling can help in getting safety without losing too much productivity.
Who guards the guards?
 Formal proof systems have a small trusted kernel.
 All proofs are verified, directly or indirectly, by the kernel.
 The trusted kernel itself should be checked independently and thoroughly several times.
 For example, lean theorem prover has independent type checkers (i.e. kernels) in three languages.
Foundations: Logic, Sets, Types

The usual foundations of mathematics are Set Theory as a theory in Firstorder logic.
 Computation can be founded on $\lambda$calculus or two other equivalent models.
 Type theory gives common foundations integrating proofs and computations.
 All these are based on capabilities to perform basic operation
such as pattern matching on strings; and agreement on the meanings of such operations.
Foundations
of
Mathematics (and Computation)

Decide whether the following are true or false.
 $2 < 3$
 $ 3 > 4$
 $3 + 12$
 $3 += 4++ $

We can only ask whether an expression is true or false provided:
 It is a wellformed (meaningful) expression.
 It represents a statement (not an object such as a number or set).
Layers of foundations
 A language, given by rules of its grammar/syntax, that specifies wellformed expressions
as well as types of these expressions.
 Rules of deduction letting us conclude that a statement is true from other true statements.
 Axioms: statements that are known to be true.
 Rules are syntactic in nature, i.e., in terms of manipulating and pattern matching strings.
 Rules can be computed quickly.
Generative grammars
 The languages we consider are given by
 an initial vocabulary of expressions (with types),
 (introduction) rules to form new expressions from existing ones and assigning types to these,
 such that the rules depend only on the types of the expressions.
 Wellformed expressions are exactly those that can be formed from the vocabulary using the introduction rules.
 We can prove properties of such expressions by induction.
Natural and programming languages
 For example in English
 The vocabulary has nouns, verbs, adjectives etc.
 A single noun forms a noun phrase, e.g.
lecture
.
 An adjective followed by a noun phrase forms a noun phrase, e.g.
boring lecture
.
 Similarly, in most programming languages, if
x
and y
are expressions
with type Int
, we can form the expression x + y
of type Int
.
 We can also form the
Boolean
expression x < y
.
 Mathematical expressions and their types are determined by the syntax of a generative language.
 In particular, it is straightforward to generate all expressions, and to check the validity and type of an expression.
 Similarly valid deductions are those given by repeated application of syntactic rules.
 For example, the rule Modus Ponens says that for formulas $P$ and $Q$, from $P$ and $P \implies Q$ we can deduce $Q$.
 A specific class of such languages and rules of deduction constitute Firstorder logic,
on which the usual foundations of mathematics are based.
Theories and Sets
 A theory is a (firstorder) language together with a collection of axioms, i.e., statements that are taken to be true.
 A specific firstorder language and set of axioms constitute Set Theory.
 All of mathematics can be encoded in terms of Set Theory. However this is verbose and opaque, like Bytecode/assembly language .
 Hence any practical formalization depends on using some combination of a proof assistant and more expressive foundations.
Verification in the real world
 The realworld problem needs to be (correctly) modelled mathematically/logically.
 The mathematical model needs to be (correctly) embedded in Set theory or other appropriate foundations.
 By results of logic, statements that are deduced following the rules of syntax are true in the Set Theoretic model.
Data for a language

Firstorder languages (e.g. Arithmetic, Sets) are specified by (possibly empty) lists of
 (names of) constants, e.g. $0$, $1$, $\phi$.
 (names of) functions together with their degree/arities, e.g., addition ($+$) with degree $2$.
 (names of) relations together with their degree/arities, e.g.,
less than ($<$) and belongs to ($\in$) with degree $2$.
 (names of) variables  these can be assumed to be independent of the language.
Expressions
 We form two types of expressions:
 Terms: representing numbers, sets etc.
 Formulas: representing properties of numbers, sets, etc.
 Both terms and formulas may depend on variables.
Terms
Terms are expressions recursively given by the rules
 A variable $x$ is a term.
 A constant $c$ is a term.
 If $f$ is a function of degree $n$ and $t_1$, $t_2$, $\dots$, $t_n$ are terms then $f(t_1, t_2,\dots,t_n)$ is a term.
Formulas
Formulas are expressions recursively given by:
 If $t_1$ and $t_2$ are terms, then $t_1 = t_2$ is a formula.
 If $p$ is a relation of degree $n$ and $t_1$, $t_2$, $\dots$, $t_n$ are terms then $p(t_1, t_2,\dots,t_n)$ is a term.
 If $P$ and $Q$ are formulas then $P\wedge Q$, $P\vee Q$, $P\implies Q$, $P\iff Q$, and $\neg P$ are formulas.
 If $P$ is a formula and $x$ is a variable, then $\forall x P$ and $\exists x P$ are formulas.
 We recursively define free variables of terms and formulas. A formula is closed if it has no free variables.
Sets as collections
 A Set $S$ is a collection of objects, with no other structure.
 Thus, we have a relation $x \in S$, for any object $x$, corresponding to $x$ being a member of $S$.
 Further, if $S$ and $T$ are sets with the same members they are equal, i.e. if
$$\forall x\ x\in S \iff x\in T$$
then $S= T$ (Axiom of extension).
Language of Sets
 We have a firstorder language that describes sets with
 a single relation $\in$ (is a member of) of degree $2$.
 a single constant $\phi$ (the empty set).

For example, the axiom of extensionality can be described in this language by the formula
$$\forall x\forall y (\forall a\ a\in x \iff a \in y) \implies x = y.$$
 Here everything is a set, including members of sets.
What collections?
 We need some way to construct sets.
 A natural way to try to do this is to say sets are collections of objects that satisfy some property, e.g.
$$\{p: \text{$p$ is a prime number} \}.$$
 We can make this formal by, for example, defining properties to be closed formulas
in the firstorder language of sets (alternatively as closed formulas in some firstorder language).
Russell's Paradox
 Assume we had sets corresponding to all closed formulas in the firstorder language of sets.
 Consider the set
$$ S = \{X : X \notin X\}.$$
 Question: Is $S\in S$?

We see that
 if $S \notin S$ then $S \in S$,
 if $S\in S$ then $S \notin S$.
Sets from Axioms
 The axiom of specification allows us to define subsets of a set by a property.
 Thus, if we have previously defined the natural numbers $\mathbb{N}$, then we can define the set
$$\{p \in\mathbb{N} : \text{$p$ is a prime number} \}.$$
 Most axioms of set theory are statements giving us the existence of sets.
More axioms
 In our setup, we need an axiom for the empty set:
$\forall x (\lnot(x\in \phi))$

We have axioms giving the unions of pairs and of arbitrary collections of sets.
 The axiom of infinity, says that there exists a set $S$ such that
 $\phi\in S$.
 $(x\in S) \implies ((x\cup \{x\}) \in S)$

There are also two technical axioms: axioms of regularity and replacement, completing ZF.
Axiom of Choice and transfinite recursion
 The axiom of choice is an axiom independent of ZF
 including this gives ZFC.
 This says: let $(S_i)_{i\in I}$ be an indexed family of nonempty sets, where $I$ is the index set.
Then there is an indexed family of elements $(x_i)_{i\in I}$ such that $x_i\in S_i$ for all $i\in I$.
 In the presence of the axiom of choice, we can make constructions by transfinite recursion

For instance we get the trichotomy theorem: for sets $A$ and $B$, at least one of $A\leq B$ and
$B\leq A$ holds.