Proving Theorems using Incremental Learning and Hindsight Experience Replay

Authors: Eser Aygün, Ankit Anand, Laurent Orseau, Xavier Glorot, Stephen M Mcaleer, Vlad Firoiu, Lei M Zhang, Doina Precup, Shibl Mourad

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that provers trained in this way can outperform previous machine learning approaches and compete with the state of the art heuristic-based theorem prover E in its best configuration, on the popular benchmarks MPTP2078, M2k and Mizar40.
Researcher Affiliation Collaboration 1Deep Mind, London, UK 2Department of Information and Computer Science, University of California, Irvine, CA, USA 3Mila Quebec AI Institute, Montreal, QC, Canada 4Mc Gill University, Montreal, QC, Canada.
Pseudocode Yes Algorithm 1 Distributed incremental learning. launch starts a new process in parallel. For each conjecture an instance of UBS decides the sequence of time limits for solving attempts. and Algorithm 2 Example sampling algorithm.
Open Source Code No The paper does not provide an explicit statement about releasing its own source code or a link to a code repository for the described methodology.
Open Datasets Yes We evaluate our approach on two popular benchmarks: MPTP2078 (Alama et al., 2014) and M2k (Kaliszyk et al., 2018) and compare it both with TRAIL (Abdelaziz et al., 2022) as well as with E prover (Schulz, 2002; Cruanes et al., 2019), one of the leading heuristic provers. Our proposed approach substantially outperforms TRAIL (Abdelaziz et al., 2022) on both datasets, surpasses E in the auto configuration with a 100s time limit, and is competitive with E in the autoschedule configuration with a 7 days time limit. In addition, our approach finds shorter proofs than E in approximately 99.5% of cases. We perform an ablation experiment to highlight specifically the role of hindsight experience replay in the results. We also compare performance using multilayer perceptrons (MLP), graph neural networks (GNN) and transformers with spectral and sequential features, all combined with the same HER approach. Our best results are obtained using transformers with spectral features.
Dataset Splits No Hence, we do not need to split the set of conjectures into train/test/validate sets because, if the system overfits to the proofs of a subset of conjectures, it will not be able to prove more conjectures.
Hardware Specification Yes The learners used Nvidia V100s GPUs with 16GB of memory.
Software Dependencies Yes We used E prover version 2.5 (Cruanes et al., 2019)... The model is trained using stochastic gradient descent with the Adam optimizer (Kingma & Ba, 2015).
Experiment Setup Yes The hyperparameters for all experiments are given in Appendix C... The model is trained using stochastic gradient descent with the Adam optimizer (Kingma & Ba, 2015) with β1 = 0.9, β2 = 0.999 and ϵ = 10 8. A subset of hyperparameters have been selected by an initial investigation for smaller time duration, selected values are underlined: number of layers (N) in {3, 4, 5}, embedding size (dk, dv) in {64, 128, 256}, hash vector size in {16, 64, 256}, learning rate in {0.003, 0.001, 0.0003}, probability of dropout (Pdrop) in {0., 0.1, 0.2, 0.3, 0.5, 0.7}. The other hyperparameters were fixed: the batch size is 2560, the number of attention heads (h) is 8, which leads to a dimensionality (dmodel) of 512, the inner-layers have a dimensionality (d F F ) of 1024.