reproducibilityindex.ai

Proving Theorems using Incremental Learning and Hindsight Experience Replay

Authors: Eser Aygün, Ankit Anand, Laurent Orseau, Xavier Glorot, Stephen M Mcaleer, Vlad Firoiu, Lei M Zhang, Doina Precup, Shibl Mourad

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that provers trained in this way can outperform previous machine learning approaches and compete with the state of the art heuristic-based theorem prover E in its best conﬁguration, on the popular benchmarks MPTP2078, M2k and Mizar40.
Researcher Affiliation	Collaboration	1Deep Mind, London, UK 2Department of Information and Computer Science, University of California, Irvine, CA, USA 3Mila Quebec AI Institute, Montreal, QC, Canada 4Mc Gill University, Montreal, QC, Canada.
Pseudocode	Yes	Algorithm 1 Distributed incremental learning. launch starts a new process in parallel. For each conjecture an instance of UBS decides the sequence of time limits for solving attempts. and Algorithm 2 Example sampling algorithm.
Open Source Code	No	The paper does not provide an explicit statement about releasing its own source code or a link to a code repository for the described methodology.
Open Datasets	Yes	We evaluate our approach on two popular benchmarks: MPTP2078 (Alama et al., 2014) and M2k (Kaliszyk et al., 2018) and compare it both with TRAIL (Abdelaziz et al., 2022) as well as with E prover (Schulz, 2002; Cruanes et al., 2019), one of the leading heuristic provers. Our proposed approach substantially outperforms TRAIL (Abdelaziz et al., 2022) on both datasets, surpasses E in the auto conﬁguration with a 100s time limit, and is competitive with E in the autoschedule conﬁguration with a 7 days time limit. In addition, our approach ﬁnds shorter proofs than E in approximately 99.5% of cases. We perform an ablation experiment to highlight speciﬁcally the role of hindsight experience replay in the results. We also compare performance using multilayer perceptrons (MLP), graph neural networks (GNN) and transformers with spectral and sequential features, all combined with the same HER approach. Our best results are obtained using transformers with spectral features.
Dataset Splits	No	Hence, we do not need to split the set of conjectures into train/test/validate sets because, if the system overﬁts to the proofs of a subset of conjectures, it will not be able to prove more conjectures.
Hardware Specification	Yes	The learners used Nvidia V100s GPUs with 16GB of memory.
Software Dependencies	Yes	We used E prover version 2.5 (Cruanes et al., 2019)... The model is trained using stochastic gradient descent with the Adam optimizer (Kingma & Ba, 2015).
Experiment Setup	Yes	The hyperparameters for all experiments are given in Appendix C... The model is trained using stochastic gradient descent with the Adam optimizer (Kingma & Ba, 2015) with β1 = 0.9, β2 = 0.999 and ϵ = 10 8. A subset of hyperparameters have been selected by an initial investigation for smaller time duration, selected values are underlined: number of layers (N) in {3, 4, 5}, embedding size (dk, dv) in {64, 128, 256}, hash vector size in {16, 64, 256}, learning rate in {0.003, 0.001, 0.0003}, probability of dropout (Pdrop) in {0., 0.1, 0.2, 0.3, 0.5, 0.7}. The other hyperparameters were ﬁxed: the batch size is 2560, the number of attention heads (h) is 8, which leads to a dimensionality (dmodel) of 512, the inner-layers have a dimensionality (d F F ) of 1024.