Oblique Decision Trees from Derivatives of ReLU Networks
Authors: Guang-He Lee, Tommi S. Jaakkola
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our method outperforms alternative techniques for training oblique decision trees in the context of molecular property classification and regression tasks. Empirically, a locally constant network always outperforms alternative methods for training oblique decision trees by a large margin, and the ensemble of locally constant networks is competitive with classic ensemble methods. Here we evaluate the efficacy of our models (LCN, ALCN, and ELCN) using the chemical property prediction datasets from Molecule Net (Wu et al., 2018), where random forest performs competitively. We include 4 (multi-label) binary classification datasets and 1 regression dataset. The statistics are available in Table 1. |
| Researcher Affiliation | Academia | Guang-He Lee & Tommi S. Jaakkola Computer Science and Artificial Intelligence Lab MIT {guanghe,tommi}@csail.mit.edu |
| Pseudocode | No | The paper describes computational procedures in numbered lists within the text, for example, under 'Computation and time complexity', but these are not formatted as distinct pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Our implementation and data are available at https://github.com/guanghelee/iclr20-lcn. |
| Open Datasets | Yes | Here we evaluate the efficacy of our models (LCN, ALCN, and ELCN) using the chemical property prediction datasets from Molecule Net (Wu et al., 2018), where random forest performs competitively. We include 4 (multi-label) binary classification datasets and 1 regression dataset. The statistics are available in Table 1. |
| Dataset Splits | Yes | Each dataset is splitted into (train, validation, test) sets under the criterion specified in Molecule Net. |
| Hardware Specification | No | The paper does not provide specific hardware specifications (e.g., CPU, GPU models) used for running the experiments. |
| Software Dependencies | No | For RF: we use the scikit-learn (Pedregosa et al., 2011) implementation of random forest. For GBDT: we use the scikit-learn (Pedregosa et al., 2011) implementation of gradient boosting trees. The paper mentions software such as 'scikit-learn' but does not specify version numbers for any libraries or dependencies. |
| Experiment Setup | Yes | For decision trees, LCN, LLN, and ALCN, we tune the tree depth in {2, 3, . . . , 12}. For LCN, LLN, and ALCN, we also tune the Drop Connect probability in {0, 0.25, 0.5, 0.75}. For all the datasets, we tune the depth in {2, 3, . . . , 12} and the Drop Connect probability in {0, 0.25, 0.5, 0.75}. The models are optimized with mini-batch stochastic gradient descent with batch size set to 64. For all the classification tasks, we set the learning rate as 0.1, which is annealed by a factor of 10 for every 10 epochs (30 epochs in total). For the regression task, we set the learning rate as 0.0001, which is annealed by a factor of 10 for every 30 epochs (60 epochs in total). |