Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Minimax Rates for High-Dimensional Random Tessellation Forests

Authors: Eliza O'Reilly, Ngoc Mai Tran

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This work shows that a large class of random forests with general split directions also achieve minimax optimal rates in arbitrary dimension. This class includes STIT forests, a generalization of Mondrian forests to arbitrary split directions, and random forests derived from Poisson hyperplane tessellations. These are the first results showing that random forest variants with oblique splits can obtain minimax optimality in arbitrary dimension. Our proof technique relies on the novel application of the theory of stationary random tessellations in stochastic geometry to statistical learning theory. Keywords: random forest regression, Mondrian process, STIT tessellation, Poisson hyperplane tessellation, minimax risk bound
Researcher Affiliation	Academia	Eliza O Reilly EMAIL Applied Mathematics and Statistics Department Johns Hopkins University Baltimore, MD 21218, USA Ngoc Mai Tran EMAIL Department of Mathematics University of Texas at Austin Austin, TX 78712, USA
Pseudocode	No	The paper describes procedures in paragraph form, such as the 'procedure to construct a random partition' in Section 2.1, but does not include any clearly labeled pseudocode blocks or algorithms formatted as code.
Open Source Code	No	The paper does not contain any explicit statements about code availability, links to repositories, or mentions of code in supplementary materials for the methodology described.
Open Datasets	No	The paper discusses theoretical aspects of random forests and statistical learning, using general terms like 'input data' or 'underlying data set.' It does not refer to any specific datasets, nor does it perform experiments on them. Therefore, there is no information about public dataset availability.
Dataset Splits	No	The paper is theoretical and does not describe experiments using specific datasets. Consequently, there is no mention of training, testing, or validation dataset splits.
Hardware Specification	No	The paper is theoretical and focuses on mathematical proofs and convergence rates. It does not describe any experimental setup or the hardware used to run experiments.
Software Dependencies	No	The paper is theoretical and does not describe any implementation details or experiments that would require specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical, presenting mathematical results and proofs related to minimax rates for random forests. It does not describe any empirical experiments, and therefore, no experimental setup details like hyperparameters or training configurations are provided.