Unsupervised Discovery of Formulas for Mathematical Constants

Authors: Michael Shalyt, Uri Seligmann, Itay Beit Halachmi, Ofir David, Rotem Elimelech, Ido Kaminer

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our methodology on a set of 1,768,900 such formulas, identifying many known formulas for mathematical constants, and discover previously unknown formulas for π, ln(2), Gauss , and Lemniscate s constants.
Researcher Affiliation Academia Michael Shalyt Uri Seligmann Itay Beit Halachmi Ofir David Rotem Elimelech Ido Kaminer Technion Israel Institute of Technology, Haifa 3200003, Israel shalyt@technion.ac.il, uri.seligmann@gmail.com itaybe@campus.technion.ac.il, eofirdavid@gmail.com rotem.eli@campus.technion.ac.il, kaminer@technion.ac.il
Pseudocode No The paper contains a flowchart (Figure 1) describing the methodology steps, but no explicitly labeled 'Pseudocode' or 'Algorithm' block with structured code-like steps.
Open Source Code Yes Project repository: https://github.com/RamanujanMachine/Blind-Delta-Algorithm
Open Datasets No This work focuses on polynomials up to 2nd degree: an = A2n2 + A1n + A0, bn = B2n2 + B1n + B0, with integer coefficients in the domain 5 Ai 5, 5 Bi 5. We removed the a = 0 and b = 0 cases, as they break the PCF structure, leaving us with 1,768,900 formulas. The paper describes how the dataset was constructed but does not provide a direct link or formal citation to download the specific dataset they generated.
Dataset Splits Yes We filtered out all formulas that do not converge, providing the final filtered dataset of 1,543,926 formulas.
Hardware Specification Yes For the worst case PCF this requires 36MB of memory (without optimizations) and 1.9 seconds of run time on a single core of a basic workstation, which translates to an upper cap of 900 hours for the whole data set. In practice we used a high power cluster with 64 cores, which ran each iteration of the measurements in 8.5 hours.
Software Dependencies Yes Scipy (Version: 1.11.3) BSD License (Copyright (c) 2001-2002 Enthought, Inc. 2003-2024, Sci Py Developers. All rights reserved.) gmpy2 (Version: 2.1.5) GNU Lesser General Public License v3 or later Numpy (Version 1.26.1)BSD License (Copyright (c) 2005-2023, Num Py Developers. All rights reserved.)
Experiment Setup Yes Based on this set of metrics, we applied unsupervised clustering for unlabeled data (using the hierarchical density-based HDBSCAN algorithm [Mc Innes et al., 2017]). The irrationality measure: for each PCF, we calculate δpredicted (Eq.5) and compute δ directly using the Blind-δ algorithm (presented in section 3.4) at depth n = 1000. The conventional classification of the PCFs is by the coefficients of their polynomials an, bn, (A2, A1, A0, B2, B1, B0), and by their numerical limit L, which we evaluate here at depth n = 2000. Specifically, only 5 points were used for the fit.