Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Mixed-curvature decision trees and random forests

Authors: Philippe Chlenski, Quentin Chu, Raiyan R. Khan, Kaizhu Du, Antonio Khalil Moretti, Itsik Pe’Er

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In benchmarks on a diverse suite of 57 classification, regression, and link prediction tasks, our product RFs ranked first on 29 tasks and came in the top 2 for 41. This highlights the value of product RFs as straightforward yet powerful new tools for data analysis in product manifolds.
Researcher Affiliation	Academia	1Columbia University 2Barnard College 3Spelman College. Correspondence to: Philippe Chlenski <EMAIL>.
Pseudocode	Yes	Algorithm 1 Product Space Decision Tree
Open Source Code	Yes	Code for our method is available at https://github.com/pchlenski/manify.
Open Datasets	Yes	Appendix J. Datasets availability. This table lists all of the datasets used in this paper, with download links and citations. For example, Cite Seer (Giles et al., 1998) from Network Repository, MNIST (Lecun et al., 1998) from Hugging Face, Traffic (Fedesoriano, 2020) from Kaggle.
Dataset Splits	Yes	We apply an identical 80:20 train-test split to all of our data, train our models on the training set, and evaluate performance on the test set.
Hardware Specification	No	The paper mentions running experiments and evaluating models but does not provide specific details on the hardware used (e.g., specific GPU or CPU models, memory sizes).
Software Dependencies	No	The paper mentions several software packages and libraries such as Scikit-Learn, Manify, Geoopt, PyTorch (implicitly for Geoopt), NetworkX, and Matplotlib. However, it does not provide specific version numbers for any of these dependencies, which is required for a reproducible description.
Experiment Setup	Yes	Specifically, we set the following hyperparameters for both DTs and RFs: max depth = 5 min samples split = 2 min samples leaf = 1 min impurity decrease = 0.0. For RFs, we also set the following hyperparameters: n estimators = 12 max features = "sqrt" bootstrap = True (subsamples the training data) max samples = None. For all neural networks, we used a learning rate of .0001 and trained for 4,000 epochs. Both optimizers use the hyperparameters β1 = 0.9 and β2 = 0.999. Batch size: 4,096 Number of samples per point: 64 β (weight for KL-divergence in VAE loss): 1.