reproducibilityindex.ai

Narrowing the Gap: Random Forests In Theory and In Practice

Authors: Misha Denil, David Matheson, Nando De Freitas

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also provide an empirical evaluation, comparing our algorithm and other theoretically tractable random forest models to the random forest algorithm used in practice. Our experiments provide insight into the relative importance of different simpliﬁcations that theoreticians have made to obtain tractable models for analysis.
Researcher Affiliation	Academia	1University of Oxford, United Kingdom 2University of British Columbia, Canada
Pseudocode	Yes	Section 4. Algorithm. In this section we describe the workings of our random forest algorithm. Each tree in the random regression forest is constructed independently.
Open Source Code	No	The paper does not provide any concrete access to source code for the described methodology.
Open Datasets	Yes	For our ﬁrst set of experiments we used four data sets from the UCI repository: Diabetes, Wine Quality, Year Prediction MSD and CT Slice.
Dataset Splits	Yes	All results in the this section are the mean of ﬁve runs of ﬁve fold cross validation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	Breiman and our own algorithm specify a minimum leaf size, which we set to 5 following Breiman’s advice for regression (Breiman, 2001). Biau08 and Biau12 are parameterized in terms of a target number of leafs rather than a minimum leaf size. For these algorithms we choose the target number of leafs to be n/5... In all the experiments in this section we follow Breiman’s rule of thumb of using one third of the total number of attributes as candidate dimensions. For our algorithm we choose m = 1000 structure points for selecting the search range in the candidate dimensions.