reproducibilityindex.ai

Pruning Random Forests for Prediction on a Budget

Authors: Feng Nan, Joseph Wang, Venkatesh Saligrama

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, our pruning algorithm outperforms existing state-of-the-art resource-constrained algorithms.
Researcher Affiliation	Academia	Feng Nan Systems Engineering Boston University fnan@bu.edu Joseph Wang Electrical Engineering Boston University joewang@bu.edu Venkatesh Saligrama Electrical Engineering Boston University srv@bu.edu
Pseudocode	Yes	Algorithm 1 BUDGETPRUNE
Open Source Code	No	The paper does not provide concrete access to source code for the described methodology. It mentions using commercial solvers like CPLEX and Gurobi, but no link or statement about their own implementation code.
Open Datasets	Yes	We test our pruning algorithm BUDGETPRUNE on four benchmark datasets used for prediction-time budget algorithms. The first two datasets have unknown feature acquisition costs so we assign costs to be 1 for all features; The last two datasets have real feature acquisition costs measured in terms of CPU time. ... Mini Boo NE Particle Identiﬁcation and Forest Covertype Datasets:[7] ... Yahoo! Learning to Rank:[6] ... Scene15 [13]:
Dataset Splits	Yes	There are 141397/146769/184968 examples in the training/validation/test sets. ... Following [22] we divide it into 1500/300/2685 examples for training/validation/test sets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or machine specifications) used for running the experiments.
Software Dependencies	Yes	We implement BUDGETPRUNE using CPLEX [1] network flow solver for the primal update step. The running time is significantly reduced (from hours down to minutes) compared to directly solving the LP relaxation of (IP) using standard solvers such as Gurobi [10].
Experiment Setup	Yes	For each dataset we first train a RF and apply BUDGETPRUNE on it using different λ s to obtain various points on the accuracy-cost tradeoff curve. ... Our base RF consists of 40 trees using entropy split criteria and choosing from the full set of features at each split. ... Our base RF consists of 140 trees using cost weighted entropy split criteria as in [16] and choosing from a random subset of 400 features at each split. ... Our base RF consists of 500 trees using entropy split criteria and choosing from a random subset of 20 features at each split.