DeepMath - Deep Sequence Models for Premise Selection

Authors: Geoffrey Irving, Christian Szegedy, Alexander A. Alemi, Niklas Een, Francois Chollet, Josef Urban

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics. We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the handengineered features of existing state-of-the-art models. To our knowledge, this is the first time deep learning has been applied to theorem proving on a large scale. (Abstract) ... 6 Experiments (Section Title)
Researcher Affiliation Collaboration Alexander A. Alemi Google Inc. alemi@google.com François Chollet Google Inc. fchollet@google.com Niklas Een Google Inc. een@google.com Geoffrey Irving Google Inc. geoffreyi@google.com Christian Szegedy Google Inc. szegedy@google.com Josef Urban Czech Technical University in Prague josef.urban@gmail.com
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks. It provides architectural diagrams but not algorithmic steps in pseudocode format.
Open Source Code No The paper mentions using 'Tensor Flow framework [1] and the Keras library [5]' (Section 6.4), but it does not state that the authors are releasing their own source code for the methodology described in the paper.
Open Datasets Yes We use the Mizar Mathematical Library (MML) version 4.181.11473 as the formal corpus and E prover [32] version 1.9 as the underlying ATP system. (Section 3) ... For training and evaluation we use a subset of 32,524 out of 57,917 theorems that are known to be provable by an ATP given the right set of premises. (Section 6.1)
Dataset Splits Yes We split off a random 10% of these (3,124 statements) for testing and validation. Also, we held out 400 statements from the 3,124 for monitoring training progress, as well as for model and checkpoint selection. (Section 6.1)
Hardware Specification Yes The neural networks were trained using asynchronous distributed stochastic gradient descent using the Adam optimizer [24] with up to 20 parallel NVIDIA K-80 GPU workers per model. (Section 6.4)
Software Dependencies No The paper mentions using 'Tensor Flow framework [1] and the Keras library [5]' (Section 6.4), but it does not specify version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes All our neural network models use the general architecture from Fig 3: a classifier on top of the concatenated embeddings of an axiom and a conjecture. The same classifier architecture was used for all models: a fully-connected neural network with one hidden layer of size 1024. (Section 6.3) ... The character level models were trained with maximum sequence length 2048 characters, where the word-level (and definition embedding) based models had a maximum sequence length of 500 words. (Section 6.4) ... For E prover, we used auto strategy with a soft time limit of 90 seconds, a hard time limit of 120 seconds, a memory limit of 4 GB, and a processed clauses limit of 500,000. (Section 6.5)