reproducibilityindex.ai

A Direct Boosting Approach for Semi-supervised Classification

Authors: Shaodan Zhai, Tian Xia, Zhongliang Li, Shaojun Wang

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on a number of UCI datasets and synthetic data show that SSDBoost gives competitive or superior results over the state-of-the-art supervised and semi-supervised boosting algorithms in the cases that the labeled data is limited, and it is very robust in noisy cases.
Researcher Affiliation	Academia	Shaodan Zhai, Tian Xia, Zhongliang Li, Shaojun Wang Kno.e.sis Center Wright State University, Dayton, US {zhai.6,xia.7,li.141,shaojun.wang}@wright.edu
Pseudocode	Yes	Algorithm 1 Minimize the generalized 0-1 loss on D. Algorithm 2 Maximize margins on Dl Du.
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described.
Open Datasets	Yes	In this section, we ﬁrst evaluate the performance of SSDBoost on 10 UCI datasets from the UCI repository [Frank and Asuncion, 2010]
Dataset Splits	Yes	The classiﬁcation error is estimated by 10-fold crossvalidation. For each dataset, we partition it into 10 parts evenly. In each fold, we use eight parts for training, one part for validation5, and the remaining part for testing.
Hardware Specification	No	The paper mentions running times and that the implementation was in C++ ('We implemented each semi-supervised boosting algorithm by C++'), but does not provide specific hardware details such as GPU or CPU models, or memory.
Software Dependencies	No	The paper mentions implementation in C++ and the use of decision trees but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	For each boosting method, the depth of decision trees is chosen from 1, 2, and 3 by the validation set (for the datasets in the experiments, decision trees with a depth of 1-3 are sufﬁcient to produce good results). For Ada Boost, ASSEMBLE, and SERBoost, the validation data is also used to perform early stopping since overﬁtting is observed for these methods. We run these algorithms with a maximum of 3000 iterations, and then choose the ensemble classiﬁer from the round with minimal error on the validation data. For ASSEMBLE, SERBoost, and SSDBoost, the trade-off parameters that control the inﬂuence of unlabeled data are chosen from the values {0.001, 0.01, 0.1, 1} by the validation data. For LPBoost, Direct Boost, and SSDBoost, the parameter n is chosen by the validation set from the values {n/10, n/5, n/3, n/2}. For SSDBoost, the parameter m is chosen from the values {m/10, m/5, m/3, m/2}, and ϵ is ﬁxed to be 0.01 since it does not signiﬁcantly affect the performance as long as its value was a small number.