Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Tree-Based Premise Selection for Lean4
Authors: Zichen Wang, Anjie Dong, Zaiwen Wen
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive evaluation on a large theorem library extracted from Mathlib4 demonstrates that our method significantly outperforms existing premise retrieval tools across various metrics. Experimental analysis, including ablation studies and parameter sensitivity analysis, validates the contribution of individual components and highlights the efficacy of our structure-aware approach and multi-metric fusion. |
| Researcher Affiliation | Academia | Zichen Wang School of Mathematical Sciences Peking University, China EMAIL Anjie Dong School of Data Science The Chinese University of Hong Kong, Shenzhen, China EMAIL Zaiwen Wen Beijing International Center for Mathematical Research Peking University, China EMAIL |
| Pseudocode | No | The paper describes algorithms such as the CSE algorithm (Section 3.1) and the WL kernel method (Section 3.2) using detailed, numbered steps within the main text, but it does not present them in a distinct pseudocode block or algorithm figure. |
| Open Source Code | Yes | We evaluate under a unified protocol and release code/configuration for reproducibility. |
| Open Datasets | Yes | The theorem library is constructed by extracting 217,555 valid mathematical theorems from Lean4 s Mathlib4 version v4.18.0-rc1. |
| Dataset Splits | Yes | We construct two test sets for the evaluation of the method. The test set A consists of 100 small-scale problems... The test set B consists of m = 6119 (state, theorem) pairs extracted from the Lean proof environment... |
| Hardware Specification | Yes | The experiments are carried out on a local machine (Macbook Pro) equipped with an Apple M3 Pro chip (11 cores). |
| Software Dependencies | No | The entire clustering pipeline is implemented in Python, primarily utilizing the PCA and Mini Batch KMeans modules provided by the scikit-learn library for core computation, and psycopg2 for database interaction with the Postgre SQL database. |
| Experiment Setup | Yes | For these experiments, the weights were set to (α,β,γ,δ) = (0.1,0.3,0.1,0.5), and the TED costs were CI = CD = 1 and CR = 0.4, with a low-cost scaling factor of σ = 0.2. [...] Clustering is configured to find 10,000 clusters, utilizing an incremental learning approach where Mini-Batches of 10,000 samples are processed iteratively to update cluster centers. [...] projecting the vectors into a lower-dimensional space comprising 1200 principal components. |