Efficient Non-greedy Optimization of Decision Trees
Authors: Mohammad Norouzi, Maxwell Collins, Matthew A. Johnson, David J. Fleet, Pushmeet Kohli
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on several classification benchmarks demonstrate that the resulting non-greedy decision trees outperform greedy decision tree baselines. Experiments are conducted on several benchmark datasets from Lib SVM [6] for multi-class classification, namely Sens IT, Connect4, Protein, and MNIST. |
| Researcher Affiliation | Collaboration | 1,4 Department of Computer Science, University of Toronto 2 Department of Computer Science, University of Wisconsin-Madison 3,5 Microsoft Research |
| Pseudocode | Yes | Algorithm 1 Stochastic gradient descent (SGD) algorithm for non-greedy decision tree learning. |
| Open Source Code | No | The paper does not provide concrete access to source code. |
| Open Datasets | Yes | Experiments are conducted on several benchmark datasets from Lib SVM [6] for multi-class classification, namely Sens IT, Connect4, Protein, and MNIST. |
| Dataset Splits | Yes | We use the provided train; validation; test sets when available. If such splits are not provided, we use a random 80%/20% split of the training data for train; validation and a random 64%/16%/20% split for train; validation; test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions "Lib SVM" as a source for datasets but does not provide specific version numbers for any software dependencies used in the implementation (e.g., Python, PyTorch, TensorFlow, etc.). |
| Experiment Setup | Yes | The hyper-parameters for each method are tuned for each depth independently. We train multiple trees for different values of ν {0.1, 1, 4, 10, 43, 100}, and we pick the value of ν that produces the tree with minimum validation error. We also tune the choice of the SGD learning rate, η, in this step. To build non-greedy trees, we initially build an axis-aligned tree with split functions that threshold a single feature optimized using conventional procedures that maximize information gain. The axisaligned split is used to initialize a greedy variant of the tree training procedure called CO2 [18]. This provides initial values for W and Θ for the non-greedy procedure. |