A sample proof identifying Lemmas and generating by Derivatives

We illustrate some key features of proving by learning with generative models based on Dependent Type Theory in the ProvingGround project. Some details of the project are available on its website.

We have a set $M$ with a binary operation $*$, and given elements $e_l$ and $e_m$. We also have terms correspoding to the assumptions

  • $e_l * a = e_l$ for all $a\in M$ (left identity axiom),
  • $a * e_r = e_r$ for all $a\in M$ (right identity axiom),
  • equality is symmetric and transitive.

From these, we conclude that basic algebraic result that $e_l = e_r$. Note that the proof in its fully expanded form is fairly long, involving the correct instantiations of each of the identity axioms, a use of symmetry of equality and a use of transitivity of equality. In particular it has very low weight in a direct generative model.

However, our system finds the proof quite readily (much less than a minute even running synchronously in the notebook), due to a few key principles.

  • We use function application with unification, thus we can apply symmetry of equality directly to a term of type $a = b$ even though the type of symmetry is $\Pi_{x : M}\Pi_{y: M} x = y \to y= x$ as $x=a$ and $y=b$ are inferred.
  • Most importantly, we learn that a lemma such as $e_l = e_l * e_r$, which is obtained by instantiation and symmetry, is non-trivial (while only instantiation is not) based on a cost combining entropy of the initial distribution with the Kullback-Liebler divergence of theorems (inhabited types) from their proofs.
  • We also gain efficiency by using the derivatives of the generative models to find consequences of lemmas.

We import the locally published core of ProvingGround.

In [1]:
import $ivy.`io.github.siddhartha-gadgil::provingground-core-jvm:0.1.0`
Out[1]:
import $ivy.$                                       

We import the relevant packages, including the one with our example.

In [2]:
import provingground._ , interface._, HoTT._
import learning._
import library._, MonoidSimple._
Out[2]:
import provingground._ , interface._, HoTT._

import learning._

import library._, MonoidSimple._
In [3]:
repl.pprinter.bind(translation.FansiShow.fansiPrint)

To set up a generator. For simplicity in this demo, we are not generating $\lambda$'s and $\Pi$-types

In [4]:
val tg = TermGenParams(lmW=0, piW=0)
Out[4]:
tg: TermGenParams = TermGenParams(0.1, 0.1, 0.1, 0.0, 0.0, 0.05, 0.05, 0.05, 0.3, 0.5, 0.5)

We set up the initial state for learning.

In [5]:
val ts = TermState(dist1, dist1.map(_.typ))
Out[5]:
ts: TermState = TermState(
  FiniteDistribution(
    Vector(
      Weighted(e_l, 0.047619047619047616),
      Weighted(e_r, 0.047619047619047616),
      Weighted(mul, 0.047619047619047616),
      Weighted(eqM, 0.047619047619047616),
      Weighted(axiom_{eqM(a)(a)}, 0.047619047619047616),
      Weighted(axiom_{(eqM(a)(b) \to eqM(b)(a))}, 0.047619047619047616),
      Weighted(axiom_{(eqM(a)(b) \to (eqM(b)(c) \to eqM(a)(c)))}, 0.047619047619047616),
      Weighted(axiom_{eqM(mul(e_l)(a))(a)}, 0.047619047619047616),
      Weighted(axiom_{eqM(mul(a)(e_r))(a)}, 0.047619047619047616),
      Weighted(eqM, 0.2857142857142857),
      Weighted(mul, 0.2857142857142857)
    )
  ),
  FiniteDistribution(
    Vector(
      Weighted(M, 0.047619047619047616),
      Weighted(M, 0.047619047619047616),
      Weighted((M → (M → M)), 0.047619047619047616),
      Weighted((M → (M → 𝒰 )), 0.047619047619047616),
      Weighted(∏(a : M){ eqM(a)(a) }, 0.047619047619047616),
      Weighted(∏(a : M){ ∏(b : M){ (eqM(a)(b) → eqM(b)(a)) } }, 0.047619047619047616),
      Weighted(
        ∏(a : M){ ∏(b : M){ ∏(c : M){ (eqM(a)(b) → (eqM(b)(c) → eqM(a)(c))) } } },
        0.047619047619047616
      ),
      Weighted(∏(a : M){ eqM(mul(e_l)(a))(a) }, 0.047619047619047616),
      Weighted(∏(a : M){ eqM(mul(a)(e_r))(a) }, 0.047619047619047616),
      Weighted((M → (M → 𝒰 )), 0.2857142857142857),
      Weighted((M → (M → M)), 0.2857142857142857)
    )
  ),
  Vector(),
  FiniteDistribution(Vector()),
  FiniteDistribution(Vector()),
  Empty
)

We use the monix library for concurrency, though in this notebook we run code synchronously.

In [6]:
import monix.execution.Scheduler.Implicits.global
Out[6]:
import monix.execution.Scheduler.Implicits.global 

We synchronously evolve to the next state with our generative model.

In [7]:
val evT = tg.evolvedStateTask(ts, 0.0001)
Out[7]:
evT: monix.eval.Task[EvolvedState] = Map(
  FlatMap(
    FlatMap(Async(<function2>, true, true, true), monix.eval.Task$$Lambda$3824/1852322432@1aa95545),
    provingground.learning.TermGenParams$$Lambda$3825/60382130@5e31bbf4
  ),
  provingground.learning.TermGenParams$$Lambda$3826/1097822984@4e65d7fd,
  0
)
In [8]:
val ev = evT.runSyncUnsafe()
Out[8]:
ev: EvolvedState = EvolvedState(
  TermState(
    FiniteDistribution(
      Vector(
        Weighted(e_l, 0.047619047619047616),
        Weighted(e_r, 0.047619047619047616),
        Weighted(mul, 0.047619047619047616),
        Weighted(eqM, 0.047619047619047616),
        Weighted(axiom_{eqM(a)(a)}, 0.047619047619047616),
        Weighted(axiom_{(eqM(a)(b) \to eqM(b)(a))}, 0.047619047619047616),
        Weighted(axiom_{(eqM(a)(b) \to (eqM(b)(c) \to eqM(a)(c)))}, 0.047619047619047616),
        Weighted(axiom_{eqM(mul(e_l)(a))(a)}, 0.047619047619047616),
        Weighted(axiom_{eqM(mul(a)(e_r))(a)}, 0.047619047619047616),
        Weighted(eqM, 0.2857142857142857),
        Weighted(mul, 0.2857142857142857)
      )
    ),
    FiniteDistribution(
      Vector(
        Weighted(M, 0.047619047619047616),
        Weighted(M, 0.047619047619047616),
        Weighted((M → (M → M)), 0.047619047619047616),
        Weighted((M → (M → 𝒰 )), 0.047619047619047616),
        Weighted(∏(a : M){ eqM(a)(a) }, 0.047619047619047616),
        Weighted(∏(a : M){ ∏(b : M){ (eqM(a)(b) → eqM(b)(a)) } }, 0.047619047619047616),
        Weighted(
          ∏(a : M){ ∏(b : M){ ∏(c : M){ (eqM(a)(b) → (eqM(b)(c) → eqM(a)(c))) } } },
          0.047619047619047616
        ),
        Weighted(∏(a : M){ eqM(mul(e_l)(a))(a) }, 0.047619047619047616),
        Weighted(∏(a : M){ eqM(mul(a)(e_r))(a) }, 0.047619047619047616),
        Weighted((M → (M → 𝒰 )), 0.2857142857142857),
        Weighted((M → (M → M)), 0.2857142857142857)
      )
    ),
    Vector(),
    FiniteDistribution(Vector()),
    FiniteDistribution(Vector()),
    Empty
  ),
  TermState(
    FiniteDistribution(
      Vector(
        Weighted(
          axiom_{(eqM(a)(b) \to eqM(b)(a))}(mul(e_l)(e_r))(e_l)(axiom_{eqM(mul(a)(e_r))(a)}(e_l)),
          1.7403087380814608E-4
        ),
        Weighted(
          axiom_{(eqM(a)(b) \to (eqM(b)(c) \to eqM(a)(c)))}(e_r)(mul(e_r)(e_r)),
...

Identifying Lemmas

We identify lemmas as types that we know to be inhabited but are not inhabited by terms in the generating set. We have kept things simple, so there are fewer lemmas to start with. But the main step is learning the weights of the lemmas.

The evolution of weights is generated as an iterator. As this is a structure that is generated just once, we store it as a stream, which is a lazy, infinite list.

In [9]:
import EntropyAtomWeight._ 
val its = evolvedLemmaIters(ev)
Out[9]:
import EntropyAtomWeight._ 

its: Vector[(Typ[Term], Iterator[Double])] = Vector(
  (eqM(e_r)(e_r), <iterator>),
  (eqM(mul(e_l)(e_r))(e_l), <iterator>),
  (eqM(e_l)(e_l), <iterator>),
  (eqM(mul(e_l)(e_l))(e_l), <iterator>),
  (eqM(e_l)(mul(e_l)(e_r)), <iterator>),
  (eqM(e_l)(mul(e_l)(e_l)), <iterator>),
  (eqM(mul(e_l)(e_r))(e_r), <iterator>),
  (eqM(mul(e_r)(e_r))(e_r), <iterator>),
  (eqM(e_r)(mul(e_l)(e_r)), <iterator>),
  (eqM(e_r)(mul(e_r)(e_r)), <iterator>)
)
In [10]:
val streams = its.map{case (tp, it) => (tp, it.toStream)}
Out[10]:
streams: Vector[(Typ[Term], Stream[Double])] = Vector(
  (
    eqM(e_r)(e_r),
    Stream(
      0.006722192160544933,
      0.006638579019869816,
      0.006556965648882864,
      0.006477280618977798,
      0.006399455860163608,
      0.006323426467130178,
      0.00624913051849513,
      0.0061765089082067995,
      0.006105505188167608,
      0.006036065421222929,
      0.005968138043733603,
      0.005901673737016389,
      0.005836625306996713,
      0.00577294757147239,
      0.005710597254436561,
      0.005649532886952888,
      0.005589714714117068,
      0.0055311046076758795,
      0.005473665983908948,
      0.005417363726409428,
      0.005362164113428051,
      0.005308034749470796,
      0.005254944500864192,
      0.005202863435023806,
      0.0051517627631813305,
      0.005101614786343904,
      0.005052392844275939,
      0.005004071267309053,
      0.004956625330799809,
      0.004910031212067913,
      0.0048642659496594,
      0.004819307404790365,
      0.004775134224836896,
      0.004731725808746164,
      0.004689062274252289,
      0.004647124426788525,
      0.004605893729994663,
      0.0045653522777253305,
      0.0045254827674711985,
      0.004486268475110892,
      0.004447693230916797,
      0.004409741396742982,
      0.004372397844328093,
      0.004335647934650351,
      0.004299477498275819,
...

We see three snapshots of the evolution of weights, at times $0$, $1000$ and $10000$. We also plot the evolution below.

In [11]:
streams.map{case (tp, s) => (tp, s(0), s(1000), s(10000))}
Out[11]:
res10: Vector[(Typ[Term], Double, Double, Double)] = Vector(
  (eqM(e_r)(e_r), 0.006722192160544933, 3.426395753639293E-4, 1.9692207219442676E-5),
  (eqM(mul(e_l)(e_r))(e_l), 0.002453937779914061, 1.4530469681356127E-4, 1.1368973439945227E-5),
  (eqM(e_l)(e_l), 0.006722192160544933, 3.426395753639293E-4, 1.9692207219442676E-5),
  (eqM(mul(e_l)(e_l))(e_l), 0.002453937779914061, 1.4530469681356127E-4, 1.1368973439945227E-5),
  (eqM(e_l)(mul(e_l)(e_r)), 2.1469429421471642E-4, 0.0017663914917681596, 0.0018047053112397848),
  (eqM(e_l)(mul(e_l)(e_l)), 2.1469429421471642E-4, 0.0017663914917681596, 0.0018047053112397848),
  (eqM(mul(e_l)(e_r))(e_r), 0.002453937779914061, 1.4530469681356127E-4, 1.1368973439945227E-5),
  (eqM(mul(e_r)(e_r))(e_r), 0.002453937779914061, 1.4530469681356127E-4, 1.1368973439945227E-5),
  (eqM(e_r)(mul(e_l)(e_r)), 2.1469429421471642E-4, 0.0017663914917681596, 0.0018047053112397848),
  (eqM(e_r)(mul(e_r)(e_r)), 2.1469429421471642E-4, 0.0017663914917681596, 0.0018047053112397848)
)
In [12]:
import $ivy.`org.plotly-scala::plotly-almond:0.5.2`

import plotly._
import plotly.element._
import plotly.layout._
import plotly.Almond._
Out[12]:
import $ivy.$                                      


import plotly._

import plotly.element._

import plotly.layout._

import plotly.Almond._
In [13]:
val scatters = streams.map{case (typ, s) => Scatter((1 to 50).map(i => (i * 50)), (1 to 50).map(i => s(i * 50)), name=typ.toString )}
val tsdata = Seq(scatters: _*)
Out[13]:
scatters: Vector[Scatter] = Vector(
  Scatter(
    Some(
      Doubles(
        Vector(
          50.0,
          100.0,
          150.0,
          200.0,
          250.0,
          300.0,
          350.0,
          400.0,
          450.0,
          500.0,
          550.0,
          600.0,
          650.0,
          700.0,
          750.0,
          800.0,
          850.0,
          900.0,
          950.0,
          1000.0,
          1050.0,
          1100.0,
          1150.0,
          1200.0,
          1250.0,
          1300.0,
          1350.0,
          1400.0,
          1450.0,
          1500.0,
          1550.0,
          1600.0,
          1650.0,
          1700.0,
          1750.0,
          1800.0,
          1850.0,
          1900.0,
          1950.0,
          2000.0,
          2050.0,
          2100.0,
          2150.0,
          2200.0,
...
tsdata: Seq[Scatter] = List(
  Scatter(
    Some(
      Doubles(
        Vector(
          50.0,
          100.0,
          150.0,
          200.0,
          250.0,
          300.0,
          350.0,
          400.0,
          450.0,
          500.0,
          550.0,
          600.0,
          650.0,
          700.0,
          750.0,
          800.0,
          850.0,
          900.0,
          950.0,
          1000.0,
          1050.0,
          1100.0,
          1150.0,
          1200.0,
          1250.0,
          1300.0,
          1350.0,
          1400.0,
          1450.0,
          1500.0,
          1550.0,
          1600.0,
          1650.0,
          1700.0,
          1750.0,
          1800.0,
          1850.0,
          1900.0,
          1950.0,
          2000.0,
          2050.0,
          2100.0,
          2150.0,
          2200.0,
...

Plot of the evolution

In [14]:
tsdata.plot(title="Evolution of the lemma coefficients")
Out[14]:
res13: String = "plot-388345784"

We pick the lemma that will give us the result. Note that there are four lemmas that have high weight after the evolution, two of which will work, so one could evolve along all of them (which we do in the interactive code).

In [15]:
val pf = "lemma" :: toTyp(streams(4)._1)
pf.typ
Out[15]:
pf: U = lemma
res14_1: Typ[U] = eqM(e_l)(mul(e_l)(e_r))

Using the lemma

For quick further exploration, we use the derivatives of the recursive generating rules in the direction of the new lemma.

In [16]:
val tt = tg.nextTangStateTask(ev.result, TermState(FiniteDistribution.unif(pf), FiniteDistribution.empty), math.pow(10.0, -4))
Out[16]:
tt: monix.eval.Task[TermState] = FlatMap(
  FlatMap(Async(<function2>, true, true, true), monix.eval.Task$$Lambda$3824/1852322432@585b99f8),
  provingground.learning.TermGenParams$$Lambda$4515/437009502@428bd516
)
In [17]:
val tts = tt.runSyncUnsafe()
Out[17]:
tts: TermState = TermState(
  FiniteDistribution(
    Vector(
      Weighted(eqM(mul(e_l)(e_r))(mul(e_r)(mul(e_r)(e_r))), 3.962163928556959E-6),
      Weighted(mul(mul(e_r)(mul(e_l)(e_l)))(mul(e_l)(mul(e_r)(e_l))), 4.275649439611634E-7),
      Weighted(
        axiom_{(eqM(a)(b) \to (eqM(b)(c) \to eqM(a)(c)))}(mul(e_l)(mul(e_l)(e_l)))(mul(e_r)(mul(e_l)(e_l))),
        5.206961720491489E-8
      ),
      Weighted(
        axiom_{(eqM(a)(b) \to (eqM(b)(c) \to eqM(a)(c)))}(mul(e_r)(mul(e_l)(e_r)))(mul(e_l)(mul(e_l)(e_r))),
        5.206961720491489E-8
      ),
      Weighted(
        axiom_{(eqM(a)(b) \to eqM(b)(a))}(mul(e_r)(mul(e_l)(e_r)))(mul(e_r)(e_r)),
        1.5897514180770234E-7
      ),
      Weighted(
        axiom_{(eqM(a)(b) \to (eqM(b)(c) \to eqM(a)(c)))}(e_l)(e_l)(e_r),
        8.400760706952122E-6
      ),
      Weighted(
        axiom_{(eqM(a)(b) \to (eqM(b)(c) \to eqM(a)(c)))}(mul(e_l)(e_l))(e_l)(mul(e_l)(mul(e_r)(e_r)))(axiom_{eqM(mul(e_l)(a))(a)}(e_l)),
        1.1638204827134566E-7
      ),
      Weighted(
        axiom_{(eqM(a)(b) \to (eqM(b)(c) \to eqM(a)(c)))}(e_l)(e_r)(mul(e_r)(mul(e_r)(e_r))),
        2.372554561977124E-7
      ),
      Weighted(
        axiom_{(eqM(a)(b) \to (eqM(b)(c) \to eqM(a)(c)))}(mul(e_l)(e_r))(e_l)(mul(e_r)(mul(e_l)(mul(e_r)(e_r))))(axiom_{(eqM(a)(b) \to eqM(b)(a))}(e_l)(mul(e_l)(e_r))(lemma)),
        1.5411762537668637E-6
      ),
      Weighted(
        axiom_{(eqM(a)(b) \to (eqM(b)(c) \to eqM(a)(c)))}(e_r)(mul(e_r)(e_r)),
        4.061382346072717E-6
      ),
      Weighted(eqM(mul(e_r)(e_l))(mul(e_r)(mul(e_l)(e_l))), 3.962163928556959E-6),
      Weighted(
        axiom_{(eqM(a)(b) \to eqM(b)(a))}(mul(e_l)(e_r))(mul(e_l)(mul(e_l)(e_l))),
        4.831894088560347E-7
      ),
      Weighted(
        axiom_{(eqM(a)(b) \to eqM(b)(a))}(mul(e_l)(mul(e_l)(e_l)))(mul(e_l)(mul(e_l)(e_l))),
        5.206961720491489E-8
...

The Proof

The next step indeed gives a proof of the desired result $e_l = e_r$. Note that this has a high weight of 1.2%

In [18]:
tts.terms.filter(_.typ == eqM(l)(r))
Out[18]:
res17: FiniteDistribution[Term] = FiniteDistribution(
  Vector(
    Weighted(
      axiom_{(eqM(a)(b) \to (eqM(b)(c) \to eqM(a)(c)))}(e_l)(mul(e_l)(e_r))(e_r)(lemma)(axiom_{eqM(mul(e_l)(a))(a)}(e_r)),
      0.012182672536523097
    )
  )
)