reproducibilityindex.ai

Premise Selection for Theorem Proving by Deep Graph Embedding

Authors: Mingzhe Wang, Yihe Tang, Jian Wang, Jia Deng

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform experiments using the Hol Step dataset [10], which consists of over two million conjecturestatement pairs that can be used to evaluate premise selection. The results show that our graphembedding approach achieves large improvement over sequence-based models. In particular, our approach improves the state-of-the-art accuracy on Hol Step by 7.3%.
Researcher Affiliation	Academia	Mingzhe Wang Yihe Tang Jian Wang Jia Deng University of Michigan, Ann Arbor
Pseudocode	No	The paper describes methods and equations but does not contain a dedicated pseudocode or algorithm block.
Open Source Code	No	The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets	Yes	We evaluate our approach on the Hol Step dataset [10], a recently introduced benchmark for evaluating machine learning approaches for theorem proving. It was constructed from the proof trace ﬁles of the HOL Light theorem prover [7] on its multivariate analysis library [48] and the formal proof of the Kepler conjecture.
Dataset Splits	Yes	The dataset contains 11,410 conjectures, including 9,999 in the training set and 1,411 in the test set. There are 2,209,076 conjecture-statement pairs in total. We hold out 700 conjectures from the training set as the validation set to tune hyperparameters and perform ablation analysis.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies	No	The paper mentions RMSProp as an optimizer but does not specify versions for software libraries or dependencies.
Experiment Setup	Yes	We train our networks using RMSProp [50] with 0.001 learning rate and 1 × 10−4 weight decay. We lower the learning rate by 3X after each epoch. We train all models for ﬁve epochs and all networks converge after about three or four epochs.