Pruning Random Forests for Prediction on a Budget

Authors: Feng Nan, Joseph Wang, Venkatesh Saligrama

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, our pruning algorithm outperforms existing state-of-the-art resource-constrained algorithms.
Researcher Affiliation Academia Feng Nan Systems Engineering Boston University fnan@bu.edu Joseph Wang Electrical Engineering Boston University joewang@bu.edu Venkatesh Saligrama Electrical Engineering Boston University srv@bu.edu
Pseudocode Yes Algorithm 1 BUDGETPRUNE
Open Source Code No The paper does not provide concrete access to source code for the described methodology. It mentions using commercial solvers like CPLEX and Gurobi, but no link or statement about their own implementation code.
Open Datasets Yes We test our pruning algorithm BUDGETPRUNE on four benchmark datasets used for prediction-time budget algorithms. The first two datasets have unknown feature acquisition costs so we assign costs to be 1 for all features; The last two datasets have real feature acquisition costs measured in terms of CPU time. ... Mini Boo NE Particle Identification and Forest Covertype Datasets:[7] ... Yahoo! Learning to Rank:[6] ... Scene15 [13]:
Dataset Splits Yes There are 141397/146769/184968 examples in the training/validation/test sets. ... Following [22] we divide it into 1500/300/2685 examples for training/validation/test sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or machine specifications) used for running the experiments.
Software Dependencies Yes We implement BUDGETPRUNE using CPLEX [1] network flow solver for the primal update step. The running time is significantly reduced (from hours down to minutes) compared to directly solving the LP relaxation of (IP) using standard solvers such as Gurobi [10].
Experiment Setup Yes For each dataset we first train a RF and apply BUDGETPRUNE on it using different λ s to obtain various points on the accuracy-cost tradeoff curve. ... Our base RF consists of 40 trees using entropy split criteria and choosing from the full set of features at each split. ... Our base RF consists of 140 trees using cost weighted entropy split criteria as in [16] and choosing from a random subset of 400 features at each split. ... Our base RF consists of 500 trees using entropy split criteria and choosing from a random subset of 20 features at each split.