Narrowing the Gap: Random Forests In Theory and In Practice

Authors: Misha Denil, David Matheson, Nando De Freitas

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also provide an empirical evaluation, comparing our algorithm and other theoretically tractable random forest models to the random forest algorithm used in practice. Our experiments provide insight into the relative importance of different simplifications that theoreticians have made to obtain tractable models for analysis.
Researcher Affiliation Academia 1University of Oxford, United Kingdom 2University of British Columbia, Canada
Pseudocode Yes Section 4. Algorithm. In this section we describe the workings of our random forest algorithm. Each tree in the random regression forest is constructed independently.
Open Source Code No The paper does not provide any concrete access to source code for the described methodology.
Open Datasets Yes For our first set of experiments we used four data sets from the UCI repository: Diabetes, Wine Quality, Year Prediction MSD and CT Slice.
Dataset Splits Yes All results in the this section are the mean of five runs of five fold cross validation.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup Yes Breiman and our own algorithm specify a minimum leaf size, which we set to 5 following Breiman’s advice for regression (Breiman, 2001). Biau08 and Biau12 are parameterized in terms of a target number of leafs rather than a minimum leaf size. For these algorithms we choose the target number of leafs to be n/5... In all the experiments in this section we follow Breiman’s rule of thumb of using one third of the total number of attributes as candidate dimensions. For our algorithm we choose m = 1000 structure points for selecting the search range in the candidate dimensions.