reproducibilityindex.ai

Tree in Tree: from Decision Trees to Decision Graphs

Authors: Bingzhao Zhu, Mahsa Shoaran

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Compared to decision trees, we show that Tn T achieves better classiﬁcation performance with reduced model size, both as a stand-alone classiﬁer and as a base estimator in bagging/Ada Boost ensembles. Our proposed model is a novel, more efﬁcient, and accurate alternative to the widely-used decision trees. We test the Tn T decision graph as a stand-alone classiﬁer and benchmark it against several state-of-the-art decision tree/graph algorithms
Researcher Affiliation	Academia	Bingzhao Zhu EPFL Lausanne, Switzerland Cornell University Ithaca, NY, USA bz323@cornell.edu Mahsa Shoaran EPFL Lausanne, Switzerland mahsa.shoaran@epfl.ch
Pseudocode	Yes	Algorithm 1: Naive decision graph (NDG) [16] Algorithm 2: Tree in Tree (Tn T)
Open Source Code	Yes	We provide a Python implementation of the proposed Tn T decision graph at https://github.com/Bingzhao Zhu/Tn TDecision Graph.
Open Datasets	Yes	We include the following datasets: MNIST, Connect-4, Letter, Optical reconstruction, Pendigits, Protein, Sense IT, and USPS from the UCI machine learning repository [30] and LIBSVM Dataset [31] under Creative Commons Attribution-Share Alike 3.0 license.
Dataset Splits	Yes	If a separate test set is not available for some tasks, we randomly partition 33% of the entire data as test set. For all models, we repeat the training procedure ﬁve times with different random seeds. We removed the complexity constraints on both models and selected the hyperparameters that achieve the highest cross-validation accuracy on the training set.
Hardware Specification	Yes	Testing our Python implementation on an Intel i7-9700 CPU, it took 325.3 seconds to build a Tn T of 1k splits on the MNIST dataset (60k samples, 784 features, 10 classes).
Software Dependencies	No	The paper mentions 'Python' and 'scikit-learn library' but does not provide specific version numbers for these software components. For example, 'The ensemble methods are implemented using the scikit-learn library in Python (under the 3-Clause BSD license) [37].'
Experiment Setup	Yes	In the following experiments, we set the hyperparameters as N1 = 2, N2 = 5. The regularization coefﬁcient C is used to control the complexity of decision graphs in Tn T. The complexity penalty is ﬁxed at C = 3e 4 on all tasks. We set the hyperparameters as N1 = 2, N2 = 5 throughout the experiments.