Learning Formal Mathematics From Intrinsic Motivation

Authors: Gabriel Poesia, David Broman, Nick Haber, Noah Goodman

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on 3 axiomatic domains (propositional logic, arithmetic and group theory) demonstrate that our agent can bootstrap from only the axioms, self-improving in generating true and challenging conjectures and in finding proofs.
Researcher Affiliation Academia Departments of Computer Science1, Psychology2 and Education3, Stanford University, USA EECS and Digital Futures4, KTH Royal Institute of Technology, Sweden
Pseudocode No The paper describes the methods in prose but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks or figures.
Open Source Code Yes We provide a zip file containing all of our implementation in our supplementary material, and the configuration files and commands used to run the main experiments. This was also released on Github (https://github.com/gpoesia/minimo).
Open Datasets Yes For logic, we take the set of 35 theorems of propositional logic from Stephen Kleene s textbook Introduction to Metamathematics [20]. ... For arithmetic, we use the Natural Number Game [6]... We translate the statements into Peano, and evaluate our agents on their success rate on those problems...
Dataset Splits No The paper does not explicitly describe specific training, validation, and test splits with percentages, counts, or explicit standard split names. It mentions "training on collected examples" and "unseen conjectures" but not formal splits.
Hardware Specification Yes Our training runs (5 iterations of generating and proving 200 conjectures in each) were done on 2 machines with 5 NVIDIA A40 40GB GPUs each.
Software Dependencies No The paper mentions "GPT-2-style character-level Transformer models" and "Adam W optimizer" but does not provide specific version numbers for software dependencies like Python, PyTorch, TensorFlow, etc.
Experiment Setup Yes We represent our agents with GPT-2-style character-level Transformer models totalling approximately 8.45M parameters each (8 layers, 8 attention heads, hidden size 512, feed-forward size 2048, vocabulary size 128, absolute positional embeddings, with maximum context size of 1024). After each training iteration, we train the LM for a fixed number of 2000 steps of the Adam W optimizer (learning rate of 1e 4) with a dynamic batch size of most 10000 characters (random examples are added until their padded sequence lengths add up to more than 10000).