Oblique Decision Trees from Derivatives of ReLU Networks

Authors: Guang-He Lee, Tommi S. Jaakkola

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our method outperforms alternative techniques for training oblique decision trees in the context of molecular property classification and regression tasks. Empirically, a locally constant network always outperforms alternative methods for training oblique decision trees by a large margin, and the ensemble of locally constant networks is competitive with classic ensemble methods. Here we evaluate the efficacy of our models (LCN, ALCN, and ELCN) using the chemical property prediction datasets from Molecule Net (Wu et al., 2018), where random forest performs competitively. We include 4 (multi-label) binary classification datasets and 1 regression dataset. The statistics are available in Table 1.
Researcher Affiliation Academia Guang-He Lee & Tommi S. Jaakkola Computer Science and Artificial Intelligence Lab MIT {guanghe,tommi}@csail.mit.edu
Pseudocode No The paper describes computational procedures in numbered lists within the text, for example, under 'Computation and time complexity', but these are not formatted as distinct pseudocode or algorithm blocks.
Open Source Code Yes 1Our implementation and data are available at https://github.com/guanghelee/iclr20-lcn.
Open Datasets Yes Here we evaluate the efficacy of our models (LCN, ALCN, and ELCN) using the chemical property prediction datasets from Molecule Net (Wu et al., 2018), where random forest performs competitively. We include 4 (multi-label) binary classification datasets and 1 regression dataset. The statistics are available in Table 1.
Dataset Splits Yes Each dataset is splitted into (train, validation, test) sets under the criterion specified in Molecule Net.
Hardware Specification No The paper does not provide specific hardware specifications (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No For RF: we use the scikit-learn (Pedregosa et al., 2011) implementation of random forest. For GBDT: we use the scikit-learn (Pedregosa et al., 2011) implementation of gradient boosting trees. The paper mentions software such as 'scikit-learn' but does not specify version numbers for any libraries or dependencies.
Experiment Setup Yes For decision trees, LCN, LLN, and ALCN, we tune the tree depth in {2, 3, . . . , 12}. For LCN, LLN, and ALCN, we also tune the Drop Connect probability in {0, 0.25, 0.5, 0.75}. For all the datasets, we tune the depth in {2, 3, . . . , 12} and the Drop Connect probability in {0, 0.25, 0.5, 0.75}. The models are optimized with mini-batch stochastic gradient descent with batch size set to 64. For all the classification tasks, we set the learning rate as 0.1, which is annealed by a factor of 10 for every 10 epochs (30 epochs in total). For the regression task, we set the learning rate as 0.0001, which is annealed by a factor of 10 for every 30 epochs (60 epochs in total).