Learning Binary Decision Trees by Argmin Differentiation
Authors: Valentina Zantedeschi, Matt Kusner, Vlad Niculae
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our approach produces binary trees that are competitive with existing single tree and ensemble approaches, in both supervised and unsupervised settings." and "Our experiments demonstrate that our method leads to predictors that are competitive with state-of-the-art tree-based approaches, scaling better with the size of datasets and generalizing to many tasks. |
| Researcher Affiliation | Academia | 1Inria, Lille Nord Europe research centre 2University College London, Centre for Artificial Intelligence 3Informatics Institute, University of Amsterdam. |
| Pseudocode | Yes | Algorithm 1 Pruning via isotonic optimization" and "Algorithm 2 Learning with decision tree representations. |
| Open Source Code | Yes | The code for reproducing the results is available at https://github.com/ vzantedeschi/Latent Trees. |
| Open Datasets | Yes | Regression: Year (Bertin-Mahieux et al., 2011), Microsoft (Qin & Liu, 2013), Yahoo (Chapelle & Chang, 2011)... Binary classification: Click (...KDD Cup 2012...), Higgs (Baldi et al., 2014)... Glass (Dua & Graff, 2017), and Covtype (Blackard & Dean, 1999; Dua & Graff, 2017). |
| Dataset Splits | Yes | We make use of 20% of the training set as validation set for selecting the best model over training and for tuning the hyperparameters." and "We split the datasets into training/validation/test sets, with sizes 60%/20%/20%. |
| Hardware Specification | No | Experiments are run on a machine with 16 CPUs and 64GB of RAM, with a training time limit of 3 days. |
| Software Dependencies | No | The paper mentions optimizing using the Quasi-Hyperbolic Adam method and a C++ extension, but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | No | The paper states that hyperparameters are tuned ('We tune the hyperparameters for all methods') and optimization is done using Quasi-Hyperbolic Adam, but it does not provide specific hyperparameter values like learning rate, batch size, or number of epochs in the main text. It mentions 'Further details are provided in the supplementary'. |