reproducibilityindex.ai

Multi-Layered Gradient Boosting Decision Trees

Authors: Ji Feng, Yang Yu, Zhi-Hua Zhou

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conﬁrmed the effectiveness of the model in terms of performance and representation learning ability. The experiments for this section is mainly designed to empirically examine if it is feasible to jointly train the multi-layered structure proposed by this work. That is, we make no claims that the current structure can outperform CNNs in computer vision tasks. More speciﬁcally, we aim to examine the following questions: (Q1) Does the training procedure empirically converge? (Q2) What does the learned features look like? (Q3) Does depth help to learn a better representation? (Q4) Given the same structure, what is the performance compared with neural networks trained by either back-propagation or target-propagation?
Researcher Affiliation	Collaboration	Ji Feng , , Yang Yu , Zhi-Hua Zhou National Key Lab for Novel Software Technology, Nanjing University, China {fengj, yuy, zhouzh}@lamda.nju.edu.cn Sinovation Ventures AI Institute {fengji}@chuangxin.com
Pseudocode	Yes	Algorithm 1: Training multi-layered GBDT (m GBDT) Forest
Open Source Code	No	The paper mentions other open-source tools (XGBoost, Light GBM) but does not provide an explicit statement or link to the source code for the 'multi-layered GBDT forest (m GBDTs)' developed in this paper.
Open Datasets	Yes	The income prediction dataset [Lichman, 2013] consists of 48, 842 samples (32, 561 for training and 16, 281 for testing) of tabular data with both categorical and continuous attributes. The protein dataset [Lichman, 2013] is a 10 class classiﬁcation task consists of only 1484 training data where each of the 8 input attributes is one measurement of the protein sequence, the goal is to predict protein localization sites with 10 possible choices.
Dataset Splits	Yes	The income prediction dataset [Lichman, 2013] consists of 48, 842 samples (32, 561 for training and 16, 281 for testing) of tabular data with both categorical and continuous attributes. For a comparison, we also trained the exact same structure (input 128 128 output) on neural networks using the target propagation NN T arget P rop and standard back-propagation NN Back P rop, respectively. 10-fold cross-validation is used for model evaluation since there is no test set provided.
Hardware Specification	No	The paper notes that 'comparing wall-clock time is less meaningful since m GBDT and NNs use different devices (CPU v.s. GPU) and different implementation optimizations' but does not provide specific models or configurations for these CPUs or GPUs used in their experiments.
Software Dependencies	No	The paper mentions software tools like 'XGBoost' and 'Adam' but does not specify their version numbers or any other software dependencies with specific versioning for reproducibility.
Experiment Setup	Yes	The m GBDT used in both forward and inverse mappings have a maximum depth of 5 per tree with learning rate of 0.1. The output of the last hidden layer (which is in R3) is visualized in Figure 2b. Clearly, the model is able to transform the data points that is easier to separate. We also conducted an unsupervised learning task for autoencoding. 10, 000 points in R3 with shape S were generated, as shown in Figure 3a. Then we built an autoencoder using m GBDTs with structure (input 5 output) with MSE as its reconstruction loss. The hyper-parameters for tree conﬁgurations are the same as the 2-class classiﬁcation task. Gaussian noise with zero mean and standard deviation of 0.3 is injected in Linverse. 5 additive trees per epoch (K1 = K2 = 5), the maximum depth is ﬁxed to be 5. Adam [Kingma and Ba, 2014] with a learning rate of 0.001 and Re LU activation are used for both cases. Dropout rate of 0.25 is used for back-prop.