Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Tree-Based Premise Selection for Lean4

Authors: Zichen Wang, Anjie Dong, Zaiwen Wen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive evaluation on a large theorem library extracted from Mathlib4 demonstrates that our method significantly outperforms existing premise retrieval tools across various metrics. Experimental analysis, including ablation studies and parameter sensitivity analysis, validates the contribution of individual components and highlights the efficacy of our structure-aware approach and multi-metric fusion.
Researcher Affiliation	Academia	Zichen Wang School of Mathematical Sciences Peking University, China EMAIL Anjie Dong School of Data Science The Chinese University of Hong Kong, Shenzhen, China EMAIL Zaiwen Wen Beijing International Center for Mathematical Research Peking University, China EMAIL
Pseudocode	No	The paper describes algorithms such as the CSE algorithm (Section 3.1) and the WL kernel method (Section 3.2) using detailed, numbered steps within the main text, but it does not present them in a distinct pseudocode block or algorithm figure.
Open Source Code	Yes	We evaluate under a unified protocol and release code/configuration for reproducibility.
Open Datasets	Yes	The theorem library is constructed by extracting 217,555 valid mathematical theorems from Lean4 s Mathlib4 version v4.18.0-rc1.
Dataset Splits	Yes	We construct two test sets for the evaluation of the method. The test set A consists of 100 small-scale problems... The test set B consists of m = 6119 (state, theorem) pairs extracted from the Lean proof environment...
Hardware Specification	Yes	The experiments are carried out on a local machine (Macbook Pro) equipped with an Apple M3 Pro chip (11 cores).
Software Dependencies	No	The entire clustering pipeline is implemented in Python, primarily utilizing the PCA and Mini Batch KMeans modules provided by the scikit-learn library for core computation, and psycopg2 for database interaction with the Postgre SQL database.
Experiment Setup	Yes	For these experiments, the weights were set to (α,β,γ,δ) = (0.1,0.3,0.1,0.5), and the TED costs were CI = CD = 1 and CR = 0.4, with a low-cost scaling factor of σ = 0.2. [...] Clustering is configured to find 10,000 clusters, utilizing an incremental learning approach where Mini-Batches of 10,000 samples are processed iteratively to update cluster centers. [...] projecting the vectors into a lower-dimensional space comprising 1200 principal components.