Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Regression Trees Know Calculus

Authors: Nathan Wycoff

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Quantitative and qualitative numerical experiments reveal the capability of gradients estimated by regression trees to improve predictive analysis, solve tasks in uncertainty quantification, and provide interpretation of model behavior. 5 Numerical Experiments We now study how the proposed gradient estimator might be profitably exploited in practice. We begin with a qualitative study showing how a tree-based integrated gradient (TBIG) can facilitate local model interpretation. Then come three quantitative studies, first investigating the potential of a Tree-Based Active Subspace (TBAS) to improve prediction accuracy of a downstream tree via a rotation of the space. Subsequently, we evaluate the capacity of regression trees to estimate the Active Subspace in low and high dimension. Finally, we end with another qualitative study demonstrating how a TBAS can provide data visualization.
Researcher Affiliation	Academia	Nathan Wycoff Department of Mathematics and Statistics University of Massachusetts Amherst, MA 01003 EMAIL
Pseudocode	Yes	Algorithm 1 Tree-Based Gradient Estimation Gi 0 RP for all i. for k {1, . . . , K} do for i Nk do Gi Gρi {Get parent s estimate} Gi[σi] 2(ˆµi r ˆµi l) uiσi liσi {Update along split direction} end for end for
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We provide code reproducing our results and a README with instructions for doing so.
Open Datasets	Yes	Dataset: bike concrete gas grid keggu kin40k obesity supercond ... Name N P URL concrete 1,030 9 https://archive.ics.uci.edu/dataset/165/ kin40k 40,000 9 https://github.com/alshedivat/keras-gp/kgp/datasets/kin40k.py keggu 65,554 28 https://www.genome.jp/kegg/pathway.html bike 17,379 13 https://archive.ics.uci.edu/dataset/560/ obesity 2,111 24 https://archive.ics.uci.edu/dataset/544/ gas 36,733 12 https://archive.ics.uci.edu/dataset/224 grid 10,000 13 https://archive.ics.uci.edu/dataset/471/ supercond 21,263 82 https://archive.ics.uci.edu/dataset/464/
Dataset Splits	Yes	We measure prediction error using 100-fold cross validation.
Hardware Specification	Yes	Running this study took about five hours on a 40 core Ubuntu machine with 128 GB of RAM.
Software Dependencies	No	The paper mentions software like Scikit-Learn and JAX but does not provide specific version numbers for these software components, which is required for a 'Yes' answer.
Experiment Setup	Yes	We used the estimator ˆ IG of Equation 6 with M = 500 random points along a given path. Regression Tree (Depth 4), Regression Tree (Depth 8), Random Forest (Depth 4). For the DASM, we used a neural network with an additional layer of width 512 subsequent to the active subspace layer, and used gradient descent with a step size of 10 3 on the Mean Squared Error cost function.