Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Premise Selection for Theorem Proving by Deep Graph Embedding
Authors: Mingzhe Wang, Yihe Tang, Jian Wang, Jia Deng
NeurIPS 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform experiments using the Hol Step dataset [10], which consists of over two million conjecturestatement pairs that can be used to evaluate premise selection. The results show that our graphembedding approach achieves large improvement over sequence-based models. In particular, our approach improves the state-of-the-art accuracy on Hol Step by 7.3%. |
| Researcher Affiliation | Academia | Mingzhe Wang Yihe Tang Jian Wang Jia Deng University of Michigan, Ann Arbor |
| Pseudocode | No | The paper describes methods and equations but does not contain a dedicated pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | We evaluate our approach on the Hol Step dataset [10], a recently introduced benchmark for evaluating machine learning approaches for theorem proving. It was constructed from the proof trace ο¬les of the HOL Light theorem prover [7] on its multivariate analysis library [48] and the formal proof of the Kepler conjecture. |
| Dataset Splits | Yes | The dataset contains 11,410 conjectures, including 9,999 in the training set and 1,411 in the test set. There are 2,209,076 conjecture-statement pairs in total. We hold out 700 conjectures from the training set as the validation set to tune hyperparameters and perform ablation analysis. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions RMSProp as an optimizer but does not specify versions for software libraries or dependencies. |
| Experiment Setup | Yes | We train our networks using RMSProp [50] with 0.001 learning rate and 1 Γ 10β4 weight decay. We lower the learning rate by 3X after each epoch. We train all models for ο¬ve epochs and all networks converge after about three or four epochs. |