Multi-Layered Gradient Boosting Decision Trees
Authors: Ji Feng, Yang Yu, Zhi-Hua Zhou
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments confirmed the effectiveness of the model in terms of performance and representation learning ability. The experiments for this section is mainly designed to empirically examine if it is feasible to jointly train the multi-layered structure proposed by this work. That is, we make no claims that the current structure can outperform CNNs in computer vision tasks. More specifically, we aim to examine the following questions: (Q1) Does the training procedure empirically converge? (Q2) What does the learned features look like? (Q3) Does depth help to learn a better representation? (Q4) Given the same structure, what is the performance compared with neural networks trained by either back-propagation or target-propagation? |
| Researcher Affiliation | Collaboration | Ji Feng , , Yang Yu , Zhi-Hua Zhou National Key Lab for Novel Software Technology, Nanjing University, China {fengj, yuy, zhouzh}@lamda.nju.edu.cn Sinovation Ventures AI Institute {fengji}@chuangxin.com |
| Pseudocode | Yes | Algorithm 1: Training multi-layered GBDT (m GBDT) Forest |
| Open Source Code | No | The paper mentions other open-source tools (XGBoost, Light GBM) but does not provide an explicit statement or link to the source code for the 'multi-layered GBDT forest (m GBDTs)' developed in this paper. |
| Open Datasets | Yes | The income prediction dataset [Lichman, 2013] consists of 48, 842 samples (32, 561 for training and 16, 281 for testing) of tabular data with both categorical and continuous attributes. The protein dataset [Lichman, 2013] is a 10 class classification task consists of only 1484 training data where each of the 8 input attributes is one measurement of the protein sequence, the goal is to predict protein localization sites with 10 possible choices. |
| Dataset Splits | Yes | The income prediction dataset [Lichman, 2013] consists of 48, 842 samples (32, 561 for training and 16, 281 for testing) of tabular data with both categorical and continuous attributes. For a comparison, we also trained the exact same structure (input 128 128 output) on neural networks using the target propagation NN T arget P rop and standard back-propagation NN Back P rop, respectively. 10-fold cross-validation is used for model evaluation since there is no test set provided. |
| Hardware Specification | No | The paper notes that 'comparing wall-clock time is less meaningful since m GBDT and NNs use different devices (CPU v.s. GPU) and different implementation optimizations' but does not provide specific models or configurations for these CPUs or GPUs used in their experiments. |
| Software Dependencies | No | The paper mentions software tools like 'XGBoost' and 'Adam' but does not specify their version numbers or any other software dependencies with specific versioning for reproducibility. |
| Experiment Setup | Yes | The m GBDT used in both forward and inverse mappings have a maximum depth of 5 per tree with learning rate of 0.1. The output of the last hidden layer (which is in R3) is visualized in Figure 2b. Clearly, the model is able to transform the data points that is easier to separate. We also conducted an unsupervised learning task for autoencoding. 10, 000 points in R3 with shape S were generated, as shown in Figure 3a. Then we built an autoencoder using m GBDTs with structure (input 5 output) with MSE as its reconstruction loss. The hyper-parameters for tree configurations are the same as the 2-class classification task. Gaussian noise with zero mean and standard deviation of 0.3 is injected in Linverse. 5 additive trees per epoch (K1 = K2 = 5), the maximum depth is fixed to be 5. Adam [Kingma and Ba, 2014] with a learning rate of 0.001 and Re LU activation are used for both cases. Dropout rate of 0.25 is used for back-prop. |