Formal Methods
Mathematical proofs checked by computers.
The Pentium FDIV bug
-
Thomas Nicely, a professor of mathematics at Lynchburg College, had written code to enumerate primes,
twin primes, prime triplets, and prime quadruplets.
-
Nicely noticed some inconsistencies in the calculations shortly after adding a Pentium system to his group of computers
-
After eliminating other factors he reported the issue to Intel.
- This was a bug in the pentium chip due to missing entries in the lookup table used by the floating-point division circuitry.
- This bug had escaped testing.
-
This caused Intel to take a $475M charge against earnings and the management said "don't ever let this happen again".
- In 1996 they started proving properties of the Pentium processor FPU.
- Then in 1997 a bug was discovered in the FIST instruction (that converts floating point numbers to integers)
in the formally verified correct Pentium Pro FPU.
- It was a protocol mismatch between two blocks not accounted for in the informal arguments.
- So they went back to square one and during 1997-98 the verified the entire FPU against high-level specs
so that mismatches like the FIST bug could no longer escape.
- During 1997-99 the Pentium 4 processor was verified and there were no escapes.
TimSort
- Tim Peters developed the Timsort hybrid sorting algorithm in 2002 - this is the standard sorting algorithm in Java, Android, Python etc.
- It is a clever combination of ideas from merge sort and insertion sort, and designed to perform well on real world data.
- Specifically, data is grouped into runs, sequences that are already ordered, using insertion sort if needed.
Runs are merged as in merge sort.
- Run lengths are stored on a stack, which should be small for performance reasons.
- After having successfully verified Counting and Radix sort implementations in Java, the Dutch-German Envisage group
turned to proving the correctness of TimSort.
- After struggling to prove this, they realized they could not because the implementation was in fact wrong.
- Specifically the size of the stack for run lengths was estimated based on a (growth) property of run lengths, but this did not always hold.
- They proved that a modified version of the property holds, and a corresponding algorithm does work.
Conclusions from TimSort
- Formal methods are often classified as irrelevant and/or impracticable by practitioners.
This is not true: the researchers found and fixed a bug in a piece of software that is used by billions of users every single day.
- Even though the bug itself is unlikely to occur, it is easy to see how it could be used in an attack.
Using Formal methods
- Formal methods are mathematical proofs checked by computers.
- What are mathematical proofs?
- How do we know that the program checking the proof is correct?
- Why does everyone not do so already?
Why not always?
- Formal proving takes work, so we have a trade-off between productivity and safety.
- We have the same trade-off between dynamic and static typing; and often productivity wins.
- Better languages and tooling can help in getting safety without losing too much productivity.
Who guards the guards?
- Formal proof systems have a small trusted kernel.
- All proofs are verified, directly or indirectly, by the kernel.
- The trusted kernel itself should be checked independently and thoroughly several times.
- For example, lean theorem prover has independent type checkers (i.e. kernels) in three languages.
Foundations: Logic, Sets, Types
-
The usual foundations of mathematics are Set Theory as a theory in First-order logic.
- Computation can be founded on $\lambda$-calculus or two other equivalent models.
- Type theory gives common foundations integrating proofs and computations.
- All these are based on capabilities to perform basic operation
such as pattern matching on strings; and agreement on the meanings of such operations.
Foundations
of
Mathematics (and Computation)
-
Decide whether the following are true or false.
- $2 < 3$
- $ 3 > 4$
- $3 + 12$
- $3 += 4++ $
-
We can only ask whether an expression is true or false provided:
- It is a well-formed (meaningful) expression.
- It represents a statement (not an object such as a number or set).
Layers of foundations
- A language, given by rules of its grammar/syntax, that specifies well-formed expressions
as well as types of these expressions.
- Rules of deduction letting us conclude that a statement is true from other true statements.
- Axioms: statements that are known to be true.
- Rules are syntactic in nature, i.e., in terms of manipulating and pattern matching strings.
- Rules can be computed quickly.
Generative grammars
- The languages we consider are given by
- an initial vocabulary of expressions (with types),
- (introduction) rules to form new expressions from existing ones and assigning types to these,
- such that the rules depend only on the types of the expressions.
- Well-formed expressions are exactly those that can be formed from the vocabulary using the introduction rules.
- We can prove properties of such expressions by induction.
Natural and programming languages
- For example in English
- The vocabulary has nouns, verbs, adjectives etc.
- A single noun forms a noun phrase, e.g.
lecture
.
- An adjective followed by a noun phrase forms a noun phrase, e.g.
boring lecture
.
- Similarly, in most programming languages, if
x
and y
are expressions
with type Int
, we can form the expression x + y
of type Int
.
- We can also form the
Boolean
expression x < y
.
- Mathematical expressions and their types are determined by the syntax of a generative language.
- In particular, it is straightforward to generate all expressions, and to check the validity and type of an expression.
- Similarly valid deductions are those given by repeated application of syntactic rules.
- For example, the rule Modus Ponens says that for formulas $P$ and $Q$, from $P$ and $P \implies Q$ we can deduce $Q$.
- A specific class of such languages and rules of deduction constitute First-order logic,
on which the usual foundations of mathematics are based.
Theories and Sets
- A theory is a (first-order) language together with a collection of axioms, i.e., statements that are taken to be true.
- A specific first-order language and set of axioms constitute Set Theory.
- All of mathematics can be encoded in terms of Set Theory. However this is verbose and opaque, like Bytecode/assembly language .
- Hence any practical formalization depends on using some combination of a proof assistant and more expressive foundations.
Verification in the real world
- The real-world problem needs to be (correctly) modelled mathematically/logically.
- The mathematical model needs to be (correctly) embedded in Set theory or other appropriate foundations.
- By results of logic, statements that are deduced following the rules of syntax are true in the Set Theoretic model.
Data for a language
-
First-order languages (e.g. Arithmetic, Sets) are specified by (possibly empty) lists of
- (names of) constants, e.g. $0$, $1$, $\phi$.
- (names of) functions together with their degree/arities, e.g., addition ($+$) with degree $2$.
- (names of) relations together with their degree/arities, e.g.,
less than ($<$) and belongs to ($\in$) with degree $2$.
- (names of) variables - these can be assumed to be independent of the language.
Expressions
- We form two types of expressions:
- Terms: representing numbers, sets etc.
- Formulas: representing properties of numbers, sets, etc.
- Both terms and formulas may depend on variables.
Terms
Terms are expressions recursively given by the rules
- A variable $x$ is a term.
- A constant $c$ is a term.
- If $f$ is a function of degree $n$ and $t_1$, $t_2$, $\dots$, $t_n$ are terms then $f(t_1, t_2,\dots,t_n)$ is a term.
Formulas
Formulas are expressions recursively given by:
- If $t_1$ and $t_2$ are terms, then $t_1 = t_2$ is a formula.
- If $p$ is a relation of degree $n$ and $t_1$, $t_2$, $\dots$, $t_n$ are terms then $p(t_1, t_2,\dots,t_n)$ is a term.
- If $P$ and $Q$ are formulas then $P\wedge Q$, $P\vee Q$, $P\implies Q$, $P\iff Q$, and $\neg P$ are formulas.
- If $P$ is a formula and $x$ is a variable, then $\forall x P$ and $\exists x P$ are formulas.
- We recursively define free variables of terms and formulas. A formula is closed if it has no free variables.
Sets as collections
- A Set $S$ is a collection of objects, with no other structure.
- Thus, we have a relation $x \in S$, for any object $x$, corresponding to $x$ being a member of $S$.
- Further, if $S$ and $T$ are sets with the same members they are equal, i.e. if
$$\forall x\ x\in S \iff x\in T$$
then $S= T$ (Axiom of extension).
Language of Sets
- We have a first-order language that describes sets with
- a single relation $\in$ (is a member of) of degree $2$.
- a single constant $\phi$ (the empty set).
-
For example, the axiom of extensionality can be described in this language by the formula
$$\forall x\forall y (\forall a\ a\in x \iff a \in y) \implies x = y.$$
- Here everything is a set, including members of sets.
What collections?
- We need some way to construct sets.
- A natural way to try to do this is to say sets are collections of objects that satisfy some property, e.g.
$$\{p: \text{$p$ is a prime number} \}.$$
- We can make this formal by, for example, defining properties to be closed formulas
in the first-order language of sets (alternatively as closed formulas in some first-order language).
Russell's Paradox
- Assume we had sets corresponding to all closed formulas in the first-order language of sets.
- Consider the set
$$ S = \{X : X \notin X\}.$$
- Question: Is $S\in S$?
-
We see that
- if $S \notin S$ then $S \in S$,
- if $S\in S$ then $S \notin S$.
Sets from Axioms
- The axiom of specification allows us to define subsets of a set by a property.
- Thus, if we have previously defined the natural numbers $\mathbb{N}$, then we can define the set
$$\{p \in\mathbb{N} : \text{$p$ is a prime number} \}.$$
- Most axioms of set theory are statements giving us the existence of sets.
More axioms
- In our setup, we need an axiom for the empty set:
$\forall x (\lnot(x\in \phi))$
-
We have axioms giving the unions of pairs and of arbitrary collections of sets.
- The axiom of infinity, says that there exists a set $S$ such that
- $\phi\in S$.
- $(x\in S) \implies ((x\cup \{x\}) \in S)$
-
There are also two technical axioms: axioms of regularity and replacement, completing ZF.
Axiom of Choice and transfinite recursion
- The axiom of choice is an axiom independent of ZF
- including this gives ZFC.
- This says: let $(S_i)_{i\in I}$ be an indexed family of non-empty sets, where $I$ is the index set.
Then there is an indexed family of elements $(x_i)_{i\in I}$ such that $x_i\in S_i$ for all $i\in I$.
- In the presence of the axiom of choice, we can make constructions by transfinite recursion
-
For instance we get the trichotomy theorem: for sets $A$ and $B$, at least one of $|A|\leq |B|$ and
$|B|\leq |A|$ holds.