Reinforcement Learning of Theorem Proving

Authors: Cezary Kaliszyk, Josef Urban, Henryk Michalewski, Miroslav Olšák

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The evaluation is done on two datasets of first-order problems exported from the Mizar Mathematical Library [13] by the MPTP system [45].
Researcher Affiliation Collaboration Cezary Kaliszyk University of Innsbruck Josef Urban Czech Technical University in Prague Henryk Michalewski University of Warsaw, Institute of Mathematics of the Polish Academy of Sciences, deepsense.ai Mirek Olˇs ak Charles University
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. The methods are described in prose.
Open Source Code No The paper does not provide concrete access to source code (e.g., a specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described. It mentions the implementation size but not its availability.
Open Datasets Yes The evaluation is done on two datasets of first-order problems exported from the Mizar Mathematical Library [13] by the MPTP system [45]. The larger Miz40 dataset 5 consists of 32524 problems... 5https://github.com/JUrban/deepmath
Dataset Splits No We therefore randomly split Miz40 into a training set of 29272 problems and a testing set of 3252 problems. This describes training and testing splits but does not explicitly mention a separate validation split or how tuning was performed on the M2k dataset regarding splits.
Hardware Specification Yes All problems are run on the same hardware6 and with the same memory limits. 6Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz with 256G RAM.
Software Dependencies No The paper mentions software components like XGBoost, LIBLINEAR, and Scikit-learn, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes The final values that we use both for policy and value learning are as follows: maximum number of iterations = 400, maximum tree depth = 9, ETA (learning rate) = 0.3, early stopping rounds = 200, lambda (weight regularization) = 1.5.