reproducibilityindex.ai

AugBoost: Gradient Boosting Enhanced with Step-Wise Feature Augmentation

Authors: Philip Tannor, Lior Rokach

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	These variations on GBDT were tested on 20 classiﬁcation tasks, on which all of them outperformed GBDT and previous related work.
Researcher Affiliation	Academia	Philip Tannor1 and Lior Rokach2 1Tel-Aviv University 2Ben-Gurion University of the Negev tannor@mail.tau.ac.il, liorrk@post.bgu.ac.il
Pseudocode	Yes	In Algorithm 1 we present the training procedure, for all three of the augmentation methods.
Open Source Code	Yes	Code repository: https://github.com/ptannor/augboost
Open Datasets	Yes	The datasets [Dheeru and Karra Taniskidou, 2017; Alcala-Fdez et al., 2011] are from the UCI repository, Kaggle datasets, and the Keel dataset repository.
Dataset Splits	Yes	A random 15% of the training data was set aside for validation during the training of each ANN, and early stopping occurred if the validation loss didn t improve for 10 epochs. We performed 10-fold cross-validation and reported the mean cross-entropy.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as CPU or GPU models.
Software Dependencies	No	The paper mentions using 'scikit-learn' and 'Light GBM implementation' but does not specify their version numbers for reproducibility.
Experiment Setup	Yes	The ANNs had a simplistic and generic architecture, which was chosen using known best practices: three fully-connected hidden layers of the same size. The number of neurons in each of the hidden layers was deﬁned to be the size of the input. The hidden layers all had Re LU activation functions, the output layer had a linear activation function. Batch size was chosen to be the minimum between 300 samples and 1/15 of the data. A random 15% of the training data was set aside for validation during the training of each ANN, and early stopping occurred if the validation loss didn t improve for 10 epochs. For all methods, the number of feature subsets is set by default to three... For most of our experiments, we used n BA = 10 and 150 iterations, i.e. we trained 150 DTs, and augmented the features 15 times throughout the process.