Pruning Random Forests for Prediction on a Budget
Authors: Feng Nan, Joseph Wang, Venkatesh Saligrama
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, our pruning algorithm outperforms existing state-of-the-art resource-constrained algorithms. |
| Researcher Affiliation | Academia | Feng Nan Systems Engineering Boston University fnan@bu.edu Joseph Wang Electrical Engineering Boston University joewang@bu.edu Venkatesh Saligrama Electrical Engineering Boston University srv@bu.edu |
| Pseudocode | Yes | Algorithm 1 BUDGETPRUNE |
| Open Source Code | No | The paper does not provide concrete access to source code for the described methodology. It mentions using commercial solvers like CPLEX and Gurobi, but no link or statement about their own implementation code. |
| Open Datasets | Yes | We test our pruning algorithm BUDGETPRUNE on four benchmark datasets used for prediction-time budget algorithms. The first two datasets have unknown feature acquisition costs so we assign costs to be 1 for all features; The last two datasets have real feature acquisition costs measured in terms of CPU time. ... Mini Boo NE Particle Identification and Forest Covertype Datasets:[7] ... Yahoo! Learning to Rank:[6] ... Scene15 [13]: |
| Dataset Splits | Yes | There are 141397/146769/184968 examples in the training/validation/test sets. ... Following [22] we divide it into 1500/300/2685 examples for training/validation/test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or machine specifications) used for running the experiments. |
| Software Dependencies | Yes | We implement BUDGETPRUNE using CPLEX [1] network flow solver for the primal update step. The running time is significantly reduced (from hours down to minutes) compared to directly solving the LP relaxation of (IP) using standard solvers such as Gurobi [10]. |
| Experiment Setup | Yes | For each dataset we first train a RF and apply BUDGETPRUNE on it using different λ s to obtain various points on the accuracy-cost tradeoff curve. ... Our base RF consists of 40 trees using entropy split criteria and choosing from the full set of features at each split. ... Our base RF consists of 140 trees using cost weighted entropy split criteria as in [16] and choosing from a random subset of 400 features at each split. ... Our base RF consists of 500 trees using entropy split criteria and choosing from a random subset of 20 features at each split. |