Narrowing the Gap: Random Forests In Theory and In Practice
Authors: Misha Denil, David Matheson, Nando De Freitas
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also provide an empirical evaluation, comparing our algorithm and other theoretically tractable random forest models to the random forest algorithm used in practice. Our experiments provide insight into the relative importance of different simplifications that theoreticians have made to obtain tractable models for analysis. |
| Researcher Affiliation | Academia | 1University of Oxford, United Kingdom 2University of British Columbia, Canada |
| Pseudocode | Yes | Section 4. Algorithm. In this section we describe the workings of our random forest algorithm. Each tree in the random regression forest is constructed independently. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the described methodology. |
| Open Datasets | Yes | For our first set of experiments we used four data sets from the UCI repository: Diabetes, Wine Quality, Year Prediction MSD and CT Slice. |
| Dataset Splits | Yes | All results in the this section are the mean of five runs of five fold cross validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | Breiman and our own algorithm specify a minimum leaf size, which we set to 5 following Breiman’s advice for regression (Breiman, 2001). Biau08 and Biau12 are parameterized in terms of a target number of leafs rather than a minimum leaf size. For these algorithms we choose the target number of leafs to be n/5... In all the experiments in this section we follow Breiman’s rule of thumb of using one third of the total number of attributes as candidate dimensions. For our algorithm we choose m = 1000 structure points for selecting the search range in the candidate dimensions. |