A Direct Boosting Approach for Semi-supervised Classification

Authors: Shaodan Zhai, Tian Xia, Zhongliang Li, Shaojun Wang

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on a number of UCI datasets and synthetic data show that SSDBoost gives competitive or superior results over the state-of-the-art supervised and semi-supervised boosting algorithms in the cases that the labeled data is limited, and it is very robust in noisy cases.
Researcher Affiliation Academia Shaodan Zhai, Tian Xia, Zhongliang Li, Shaojun Wang Kno.e.sis Center Wright State University, Dayton, US {zhai.6,xia.7,li.141,shaojun.wang}@wright.edu
Pseudocode Yes Algorithm 1 Minimize the generalized 0-1 loss on D. Algorithm 2 Maximize margins on Dl Du.
Open Source Code No The paper does not provide any concrete access to source code for the methodology described.
Open Datasets Yes In this section, we first evaluate the performance of SSDBoost on 10 UCI datasets from the UCI repository [Frank and Asuncion, 2010]
Dataset Splits Yes The classification error is estimated by 10-fold crossvalidation. For each dataset, we partition it into 10 parts evenly. In each fold, we use eight parts for training, one part for validation5, and the remaining part for testing.
Hardware Specification No The paper mentions running times and that the implementation was in C++ ('We implemented each semi-supervised boosting algorithm by C++'), but does not provide specific hardware details such as GPU or CPU models, or memory.
Software Dependencies No The paper mentions implementation in C++ and the use of decision trees but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes For each boosting method, the depth of decision trees is chosen from 1, 2, and 3 by the validation set (for the datasets in the experiments, decision trees with a depth of 1-3 are sufficient to produce good results). For Ada Boost, ASSEMBLE, and SERBoost, the validation data is also used to perform early stopping since overfitting is observed for these methods. We run these algorithms with a maximum of 3000 iterations, and then choose the ensemble classifier from the round with minimal error on the validation data. For ASSEMBLE, SERBoost, and SSDBoost, the trade-off parameters that control the influence of unlabeled data are chosen from the values {0.001, 0.01, 0.1, 1} by the validation data. For LPBoost, Direct Boost, and SSDBoost, the parameter n is chosen by the validation set from the values {n/10, n/5, n/3, n/2}. For SSDBoost, the parameter m is chosen from the values {m/10, m/5, m/3, m/2}, and ϵ is fixed to be 0.01 since it does not significantly affect the performance as long as its value was a small number.