Learning Formal Mathematics From Intrinsic Motivation
Authors: Gabriel Poesia, David Broman, Nick Haber, Noah Goodman
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on 3 axiomatic domains (propositional logic, arithmetic and group theory) demonstrate that our agent can bootstrap from only the axioms, self-improving in generating true and challenging conjectures and in finding proofs. |
| Researcher Affiliation | Academia | Departments of Computer Science1, Psychology2 and Education3, Stanford University, USA EECS and Digital Futures4, KTH Royal Institute of Technology, Sweden |
| Pseudocode | No | The paper describes the methods in prose but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks or figures. |
| Open Source Code | Yes | We provide a zip file containing all of our implementation in our supplementary material, and the configuration files and commands used to run the main experiments. This was also released on Github (https://github.com/gpoesia/minimo). |
| Open Datasets | Yes | For logic, we take the set of 35 theorems of propositional logic from Stephen Kleene s textbook Introduction to Metamathematics [20]. ... For arithmetic, we use the Natural Number Game [6]... We translate the statements into Peano, and evaluate our agents on their success rate on those problems... |
| Dataset Splits | No | The paper does not explicitly describe specific training, validation, and test splits with percentages, counts, or explicit standard split names. It mentions "training on collected examples" and "unseen conjectures" but not formal splits. |
| Hardware Specification | Yes | Our training runs (5 iterations of generating and proving 200 conjectures in each) were done on 2 machines with 5 NVIDIA A40 40GB GPUs each. |
| Software Dependencies | No | The paper mentions "GPT-2-style character-level Transformer models" and "Adam W optimizer" but does not provide specific version numbers for software dependencies like Python, PyTorch, TensorFlow, etc. |
| Experiment Setup | Yes | We represent our agents with GPT-2-style character-level Transformer models totalling approximately 8.45M parameters each (8 layers, 8 attention heads, hidden size 512, feed-forward size 2048, vocabulary size 128, absolute positional embeddings, with maximum context size of 1024). After each training iteration, we train the LM for a fixed number of 2000 steps of the Adam W optimizer (learning rate of 1e 4) with a dynamic batch size of most 10000 characters (random examples are added until their padded sequence lengths add up to more than 10000). |