DeepMath - Deep Sequence Models for Premise Selection
Authors: Geoffrey Irving, Christian Szegedy, Alexander A. Alemi, Niklas Een, Francois Chollet, Josef Urban
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics. We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the handengineered features of existing state-of-the-art models. To our knowledge, this is the first time deep learning has been applied to theorem proving on a large scale. (Abstract) ... 6 Experiments (Section Title) |
| Researcher Affiliation | Collaboration | Alexander A. Alemi Google Inc. alemi@google.com François Chollet Google Inc. fchollet@google.com Niklas Een Google Inc. een@google.com Geoffrey Irving Google Inc. geoffreyi@google.com Christian Szegedy Google Inc. szegedy@google.com Josef Urban Czech Technical University in Prague josef.urban@gmail.com |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. It provides architectural diagrams but not algorithmic steps in pseudocode format. |
| Open Source Code | No | The paper mentions using 'Tensor Flow framework [1] and the Keras library [5]' (Section 6.4), but it does not state that the authors are releasing their own source code for the methodology described in the paper. |
| Open Datasets | Yes | We use the Mizar Mathematical Library (MML) version 4.181.11473 as the formal corpus and E prover [32] version 1.9 as the underlying ATP system. (Section 3) ... For training and evaluation we use a subset of 32,524 out of 57,917 theorems that are known to be provable by an ATP given the right set of premises. (Section 6.1) |
| Dataset Splits | Yes | We split off a random 10% of these (3,124 statements) for testing and validation. Also, we held out 400 statements from the 3,124 for monitoring training progress, as well as for model and checkpoint selection. (Section 6.1) |
| Hardware Specification | Yes | The neural networks were trained using asynchronous distributed stochastic gradient descent using the Adam optimizer [24] with up to 20 parallel NVIDIA K-80 GPU workers per model. (Section 6.4) |
| Software Dependencies | No | The paper mentions using 'Tensor Flow framework [1] and the Keras library [5]' (Section 6.4), but it does not specify version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | All our neural network models use the general architecture from Fig 3: a classifier on top of the concatenated embeddings of an axiom and a conjecture. The same classifier architecture was used for all models: a fully-connected neural network with one hidden layer of size 1024. (Section 6.3) ... The character level models were trained with maximum sequence length 2048 characters, where the word-level (and definition embedding) based models had a maximum sequence length of 500 words. (Section 6.4) ... For E prover, we used auto strategy with a soft time limit of 90 seconds, a hard time limit of 120 seconds, a memory limit of 4 GB, and a processed clauses limit of 500,000. (Section 6.5) |