Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Minimax Rates for High-Dimensional Random Tessellation Forests
Authors: Eliza O'Reilly, Ngoc Mai Tran
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This work shows that a large class of random forests with general split directions also achieve minimax optimal rates in arbitrary dimension. This class includes STIT forests, a generalization of Mondrian forests to arbitrary split directions, and random forests derived from Poisson hyperplane tessellations. These are the first results showing that random forest variants with oblique splits can obtain minimax optimality in arbitrary dimension. Our proof technique relies on the novel application of the theory of stationary random tessellations in stochastic geometry to statistical learning theory. Keywords: random forest regression, Mondrian process, STIT tessellation, Poisson hyperplane tessellation, minimax risk bound |
| Researcher Affiliation | Academia | Eliza O Reilly EMAIL Applied Mathematics and Statistics Department Johns Hopkins University Baltimore, MD 21218, USA Ngoc Mai Tran EMAIL Department of Mathematics University of Texas at Austin Austin, TX 78712, USA |
| Pseudocode | No | The paper describes procedures in paragraph form, such as the 'procedure to construct a random partition' in Section 2.1, but does not include any clearly labeled pseudocode blocks or algorithms formatted as code. |
| Open Source Code | No | The paper does not contain any explicit statements about code availability, links to repositories, or mentions of code in supplementary materials for the methodology described. |
| Open Datasets | No | The paper discusses theoretical aspects of random forests and statistical learning, using general terms like 'input data' or 'underlying data set.' It does not refer to any specific datasets, nor does it perform experiments on them. Therefore, there is no information about public dataset availability. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments using specific datasets. Consequently, there is no mention of training, testing, or validation dataset splits. |
| Hardware Specification | No | The paper is theoretical and focuses on mathematical proofs and convergence rates. It does not describe any experimental setup or the hardware used to run experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe any implementation details or experiments that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical, presenting mathematical results and proofs related to minimax rates for random forests. It does not describe any empirical experiments, and therefore, no experimental setup details like hyperparameters or training configurations are provided. |