Learning Binary Decision Trees by Argmin Differentiation

Authors: Valentina Zantedeschi, Matt Kusner, Vlad Niculae

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our approach produces binary trees that are competitive with existing single tree and ensemble approaches, in both supervised and unsupervised settings." and "Our experiments demonstrate that our method leads to predictors that are competitive with state-of-the-art tree-based approaches, scaling better with the size of datasets and generalizing to many tasks.
Researcher Affiliation Academia 1Inria, Lille Nord Europe research centre 2University College London, Centre for Artificial Intelligence 3Informatics Institute, University of Amsterdam.
Pseudocode Yes Algorithm 1 Pruning via isotonic optimization" and "Algorithm 2 Learning with decision tree representations.
Open Source Code Yes The code for reproducing the results is available at https://github.com/ vzantedeschi/Latent Trees.
Open Datasets Yes Regression: Year (Bertin-Mahieux et al., 2011), Microsoft (Qin & Liu, 2013), Yahoo (Chapelle & Chang, 2011)... Binary classification: Click (...KDD Cup 2012...), Higgs (Baldi et al., 2014)... Glass (Dua & Graff, 2017), and Covtype (Blackard & Dean, 1999; Dua & Graff, 2017).
Dataset Splits Yes We make use of 20% of the training set as validation set for selecting the best model over training and for tuning the hyperparameters." and "We split the datasets into training/validation/test sets, with sizes 60%/20%/20%.
Hardware Specification No Experiments are run on a machine with 16 CPUs and 64GB of RAM, with a training time limit of 3 days.
Software Dependencies No The paper mentions optimizing using the Quasi-Hyperbolic Adam method and a C++ extension, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup No The paper states that hyperparameters are tuned ('We tune the hyperparameters for all methods') and optimization is done using Quasi-Hyperbolic Adam, but it does not provide specific hyperparameter values like learning rate, batch size, or number of epochs in the main text. It mentions 'Further details are provided in the supplementary'.