The Tree Ensemble Layer: Differentiability meets Conditional Computation
Authors: Hussein Hazimeh, Natalia Ponomareva, Petros Mol, Zhenyu Tan, Rahul Mazumder
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on 23 classification datasets indicate over 10x speed-ups compared to the differentiable trees used in the literature and over 20x reduction in the number of parameters compared to gradi ent boosted trees, while maintaining competitive performance. Moreover, experiments on CIFAR, MNIST, and Fashion MNIST indicate that replac ing dense layers in CNNs with our tree layer re duces the test loss by 7-53% and the number of parameters by 8x. |
| Researcher Affiliation | Collaboration | 1Massachusetts Institute of Technology 2Google Research 3Google Brain. Correspondence to: Hussein Hazimeh <haz imeh@mit.edu>. |
| Pseudocode | Yes | Algorithm 1 Conditional Forward Pass |
| Open Source Code | Yes | We provide an open-source Tensor Flow implementation of TEL along with a Keras interface2. 2https://github.com/google-research/ google-research/tree/master/tf_trees |
| Open Datasets | Yes | 23 of these are from the Penn Machine Learning Benchmarks (PMLB) (Olson et al., 2017), and the 3 remaining are CIFAR-10 (Krizhevsky et al., 2009), MNIST (Le Cun et al., 1998), and Fashion MNIST (Xiao et al., 2017). |
| Dataset Splits | Yes | For all the experiments, we tune the hyperparameters using Hyperopt (Bergstra et al., 2013) with the Tree-structured Parzen Estimator (TPE). We optimize for either AUC or accuracy with stratified 5-fold cross-validation. |
| Hardware Specification | No | No specific hardware details such as GPU or CPU models, or specific cloud instance types, are mentioned in the paper. The paper states that TEL is implemented in 'Tensor Flow 2.0' and includes experiments with 'CNNs', implying computational resources were used, but without specific hardware specifications. |
| Software Dependencies | Yes | TEL is implemented in Tensor Flow 2.0 using custom C++ kernels for forward and back ward propagation, along with a Keras Python-accessible interface. |
| Experiment Setup | Yes | For all the experiments, we tune the hyperparameters using Hyperopt (Bergstra et al., 2013) with the Tree-structured Parzen Estimator (TPE). We optimize for either AUC or accuracy with stratified 5-fold cross-validation. NNs (including TEL) were trained using Keras with the Tensor Flow backend, using Adam (Kingma & Ba, 2014) and cross-entropy loss. As discussed in Section 2, TEL is always preceded by a batch normalization layer. For TEL, we tune the learning rate, batch size, and number of epochs (ranges are in the appendix). |