Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

From Euler to AI: Unifying Formulas for Mathematical Constants

Authors: Tomer Raz, Michael Shalyt, Elyasheev Leibtag, Rotem Kalisch, Shachar Weinbaum, Yaron Hadad, Ido Kaminer

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Applying this approach to 455,050 ar Xiv papers, we validate 385 distinct formulas for π and prove relations between 360 (94%) of them, of which 166 (43%) can be derived from a single mathematical object linking canonical formulas by Euler, Gauss, Brouncker, and newer ones from algorithmic discoveries by the Ramanujan Machine. Our system combines large language models (LLMs) for systematic formula harvesting, an LLM-code feedback loop for validation, and a novel symbolic algorithm for clustering and eventual unification. We demonstrate this methodology on the hallmark case of π, an ideal testing ground for symbolic unification. Benchmarking
Researcher Affiliation	Academia	Tomer Raz Michael Shalyt Elyasheev Leibtag Rotem Kalisch Shachar Weinbaum Yaron Hadad Ido Kaminer Technion Israel Institute of Technology, Haifa 3200003, Israel. Corresponding author: EMAIL
Pseudocode	Yes	Figure 4: The matching algorithm: connecting polynomial linear recurrences. This algorithm is demonstrated here for polynomial continued fractions (PCFs) but can be generalized to any linear polynomial recurrence. Appendix C Algorithms: This section contains an in-depth description of the algorithms discussed in Section 3. The algorithms are ordered top-down, from the highest level algorithm to the lowest.
Open Source Code	Yes	Project repository: https://github.com/Ramanujan Machine/euler2ai
Open Datasets	Yes	Applying this approach to 455,050 ar Xiv papers... [5] ar Xiv.org submitters. Kaggle ar Xiv dataset, 2024. 455,050 articles from the following categories which were indexed in the ar Xiv metadata dataset [5] as of 24 November, 2024, were scraped.
Dataset Splits	No	The paper processes 455,050 arXiv articles to extract and validate 385 distinct formulas for pi, which are then used as input for the unification algorithm. This describes a data processing pipeline and subsequent analysis of the processed data, rather than explicit train/test/validation splits for evaluating a model's performance on a dataset.
Hardware Specification	Yes	All algorithms used in the pipeline were run on a 13th Gen i5-13500H Intel Core and are available at https://github.com/Ramanujan Machine/euler2ai. Runs required for the sensitivity study were conducted on the Technion High Performance Computing Zeus Cluster.
Software Dependencies	No	The paper mentions several software components like Sym Py [38], Mathematica package by RISC [33], and Maple package [57] for minimality, and LLMs such as Open AI s GPT-4o, Claude 3.7 Sonnet, and Gemini 2.5 Pro Preview. However, it does not provide specific version numbers for Sym Py, Mathematica, or Maple, which are key for reproducible software dependencies.
Experiment Setup	Yes	In our experiments, we use N = 200 partial sums when converting each series into a corresponding recurrence. The hyperparameter sensitivity study (Appendix D) supports this choice. UMAPS with N 2d + 1 suffices to recover the coboundary matrix. Appendix D provides a detailed sensitivity study for different δ-clustering granularities and similarity thresholds, values of UMAPS's fit depth (N), and the sensitivity of RISC's Guess algorithm to its fit depth (N).