Efficient Non-greedy Optimization of Decision Trees

Authors: Mohammad Norouzi, Maxwell Collins, Matthew A. Johnson, David J. Fleet, Pushmeet Kohli

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on several classification benchmarks demonstrate that the resulting non-greedy decision trees outperform greedy decision tree baselines. Experiments are conducted on several benchmark datasets from Lib SVM [6] for multi-class classification, namely Sens IT, Connect4, Protein, and MNIST.
Researcher Affiliation Collaboration 1,4 Department of Computer Science, University of Toronto 2 Department of Computer Science, University of Wisconsin-Madison 3,5 Microsoft Research
Pseudocode Yes Algorithm 1 Stochastic gradient descent (SGD) algorithm for non-greedy decision tree learning.
Open Source Code No The paper does not provide concrete access to source code.
Open Datasets Yes Experiments are conducted on several benchmark datasets from Lib SVM [6] for multi-class classification, namely Sens IT, Connect4, Protein, and MNIST.
Dataset Splits Yes We use the provided train; validation; test sets when available. If such splits are not provided, we use a random 80%/20% split of the training data for train; validation and a random 64%/16%/20% split for train; validation; test sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions "Lib SVM" as a source for datasets but does not provide specific version numbers for any software dependencies used in the implementation (e.g., Python, PyTorch, TensorFlow, etc.).
Experiment Setup Yes The hyper-parameters for each method are tuned for each depth independently. We train multiple trees for different values of ν {0.1, 1, 4, 10, 43, 100}, and we pick the value of ν that produces the tree with minimum validation error. We also tune the choice of the SGD learning rate, η, in this step. To build non-greedy trees, we initially build an axis-aligned tree with split functions that threshold a single feature optimized using conventional procedures that maximize information gain. The axisaligned split is used to initialize a greedy variant of the tree training procedure called CO2 [18]. This provides initial values for W and Θ for the non-greedy procedure.