Lean together, Jan 2025

Real World (Pragmatic)

Autoformalization

Siddhartha Gadgil

Indian Institute of Science, Bangalore

https://github.com/siddhartha-gadgil/LeanAide
  • Autoformalization is the translation of natural language mathematics to Lean code.
  • This can help with formalization:
    • Interactively, helping with syntax and Mathlib terminology (having a search component).
    • By producing a draft translation (in the spirit of the Mathlib port), with LLMs also adding details.
  • Autoformalization also provides a bridge between the natural language reasoning of foundation models and Lean.
  • Lean helps both with ensuring correctness and organizing mathematics in a modular fashion.
  • AI/ML is deeply integrated with Lean in LeanAide.
  • This helps with:
    • Better Autoformalization.
    • Usability in Lean.
  • Tools and workflows are a central focus for us, and we make tools available at all stages.
  • We hope to iteratively improve and extend capabilities focusing on MWEs.
  • Ideally, Lean and AI can complement each other in tools for mathematicians without expertise in either.
  • We see our first demo (note that the responses are very fast because of caching).

LeanAide: Prompting

  • Suppose we are given the statement of a theorem.
  • We find (say) 20 documentation strings in Lean's mathematical library Mathlib that are closest to the statement (for OpenAI embeddings) and include them in the prompt as examples.
  • We query GPT-4 with the prompt for (say) 10 responses.

LeanAide: post-processing

  • We discard responses that are not valid Lean code (do not parse or do not elaborate).
  • However, the correct response may be preceded by some text or enclosed in a code block (in spite of instructions). So we attempt to parse and elaborate for each group of consecutive lines.
  • We also do some mild autocorrections, mainly for convention issues like Prime versus IsPrime.
  • We also use majority voting, grouping together responses that can be proved equivalent by the Aesop tactic.
  • If a vector space is $2$-dimensional, then it is finite dimensional.
    
    								[{"role": "system",
    								"content": "You are a coding assistant who translates from natural language to Lean Theorem Prover code following examples. Follow EXACTLY the examples given."},
    								{"role": "user",
    								"content": "If a vector space has a finite basis, then it is finite-dimensional. "},
    								{"role": "assistant",
    								"content": "∀ {K : Type u} {V : Type v} [inst : DivisionRing K] [inst_1 :
    								AddCommGroup V] [inst_2 : Module K V] {ι : Type w}\n [inst_3 : Finite ι], Basis ι
    								K V → FiniteDimensional K V"},
    								...
    								{"role": "user",
    								"content": "If a vector space has dimension ‘2‘ then it is finite dimensional."}]
    							  
  • Translation to Lean: If a vector space is $2$-dimensional, then it is finite dimensional.
    
    							theorem: ∀ {K : Type u} {V : Type v} 
    							[inst : DivisionRing K] [inst_1 : AddCommGroup V] 
    							[inst_2 : Module K V],
    								Module.rank K V = 2 → FiniteDimensional K V
    							  
  • This was one of our $40$ test silly statements (to avoid contamination).
  • We also had $40$ theorems and $40$ false statements.
  • On the test set of 120 example problems, we had a success rate of about 85%.
  • However, on the Putnam bench the success rate is about 50%.

Informalization and Data Augmentation

  • Informalization is describing a formal theorem, proof or some code in natural language.
  • We informalized all Lean theorems that either occured in more than 3 proofs or had docstrings.
  • To do this, we included in the prompt all the definitions in the statement of the theorem.
  • These have also been incorporated in autoformalization in LeanAide, along with the APIs from LeanSearch and Moogle.

Better translation?

  • We perform roundtrips and use a reasoning LLM to compare statements to both evaluate and filter.
  • Many failed translations were because of missing definitions, which could be fixed by:
    • Getting a language model to generate definitions.
    • Translating these and including in the prompt.
  • Even when the definition is present in the statement or in Mathlib, it helps to include it in the prompt.
  • Rewording the statements avoiding concise phrases like greatest odd divisor.
  • Meta-Fix: e.g. exists-unique with multiple variables.

Larger-scale Formalization

and

Natural language reaoning

Iterative formalization


					flowchart LR
					  P[(Problem)] --> reasoning[Reasoning by LLM]
					  reasoning --> pf[(Theorem and Proof)];
					  L[(Literature)] --> pf;
					  ans[(Notes/Answers)] --> ocr[OCR and formatting];
					  ocr --> pf;
					  pf --> |LLM|struct[(Structured JSON text)];
					  struct --> form[Autoformalization];
					  form --> lean[(Lean code)];
					  lean --> elab{Elaborate};
					  elab --> |Success|done((Done));
					  elab --> |Error|error((Fix error));
					  elab --> |Sorries|sorry((Fix Sorry));
					  sorry --> |Gap|P;
					  error --> |Details|pf;
					  error --> |Try again|form;
				  
  • We can start with a problem statement, a theorem in the literature or handwritten notes/answers.
  • We generate a text, then structured JSON by LLM.
  • This is translated to Lean programmatically and elaborated, with errors used to iterate.

Lean code via Structured Proofs

  • A language model is used to translate text to a JSON format with detailed instructions, with each object having a type field as "definition", "theorem", "assumption", "assertion" etc.
  • We translate theorems and assertions into statements including their context built from assumptions, lets etc and purge duplicates.
  • Assertions within a proof become have tactics.
  • Proofs end with (or consist of) a call to the Aesop search tactic with appropriate configuration; including "known results" and fallback to sorry.
  • My work has depended on:
    • Collaborators: Anand Rao Tadipatri, Ayush Agrawal, Navin Goyal, Ashvni Narayanan (phase 1); Anirudh Gupta, Vaishnavi Shirsath, Ajay Nair, Malhar Patel, Sushrut Jog (phase 3).
    • Compute/Models: Microsoft's Accelerating Foundation Model Research program; Google's Research credits; DST-FIST funding.

Hopes and Plans

  • Fix code generation.
  • Improve translation at sentence and proof level.
  • Handle issues regarding toolchains and embeddings to make using LeanAide straightforward.
  • Tasks and prompts to autoformalize at a larger scale by decomposing, summarizing etc.
  • Add commands for a Lean project to be better suited for AI and software assisted mathematical discovery.