Graph2Tac: Online Representation Learning of Formal Math Concepts

Authors: Lasse Blaauwbroek, Mirek Olšák, Jason Rute, Fidel Ivan Schaposnik Massolo, Jelle Piepenbrock, Vasily Pestun

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively benchmark two such online solvers implemented in the Tactician platform for the Coq proof assistant: First, Tactician s online k-nearest neighbor solver, which can learn from recent proofs, shows a 1.72 improvement in theorems proved over an offline equivalent.
Researcher Affiliation Collaboration 1IHES 2University of Cambridge 3IBM Research 4Radboud University 5Czech Technical University in Prague
Pseudocode No The paper describes the model architecture and its operational steps (e.g., Figure 4, Section 3.2, Appendix E) using textual descriptions and mathematical formulas, but it does not include a formally labeled 'Pseudocode' block or 'Algorithm' listing.
Open Source Code Yes Our code is publicly available, and every step of our work is fully reproducible (Appendix A). ... Graph2Tac: Code to train and evaluate the GNN.6. Github: https://github.com/IBM/graph2tac
Open Datasets Yes In this paper, we use a freely available dataset of Coq Data (Blaauwbroek, 2023). Information on the construction of the dataset can be found in Blaauwbroek (2024) and Blaauwbroek et al. (2024).
Dataset Splits No The paper states 'We divide packages into training and test where no test package depends on a training package. To do so, we induced a random topological order on the Coq packages, with regard to the dependency graph. The resulting list was then split such that the average percentage of theorems and of proof states in the training split is close to 90% (in our case, it is 91.3% of all theorems and 88.8% of all proof states).' It also mentions monitoring 'validation accuracy' in Appendix F, but it does not specify a precise percentage or count for a validation split.
Hardware Specification Yes Graph2Tac graph models were trained on 8 CPUs (with 128 GB RAM) and 2 NVIDIA A100 GPU cards for about 20 days. ... The GPU evaluations and the training of the transformer were performed on machines with two 20-core Inter E52698v4, a working memory of 512 GB RAM, and 8 NVIDIA Tesla V100 GPU cards.
Software Dependencies No The paper mentions using 'ragged tensors in Tensor Flow (Abadi et al., 2015)' and 'graph tensors in TF-GNN (Ferludin et al., 2022)'. It also states 'The transformer was a GPT2-style model (Radford et al., 2019) from the Huggingface Transformers library.' However, it does not provide specific version numbers for TensorFlow, TF-GNN, or the Huggingface Transformers library.
Experiment Setup Yes We train on batches of 512 definitions and 512 proof states. ... Optimization used the Adam optimizer with a learning rate of 0.0003 and L2 regularization set to 1.0e-05.