reproducibilityindex.ai

Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data

Authors: Tomoya Sakai, Marthinus Christoffel Plessis, Gang Niu, Masashi Sugiyama

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through experiments, we demonstrate the usefulness of the proposed methods.
Researcher Affiliation	Academia	1The University of Tokyo, Japan 2RIKEN, Japan.
Pseudocode	No	The paper describes procedures in text, but no structured pseudocode or algorithm blocks are provided.
Open Source Code	No	The paper mentions LIBSVM software (http://www.csie.ntu.edu.tw/~cjlin/libsvm), which is a third-party tool, but does not provide access to the authors' own implementation code for the methodology described in this paper.
Open Datasets	Yes	We used sixteen benchmark data sets taken from the UCI Machine Learning Repository (Lichman, 2013), the Semi-Supervised Learning book (Chapelle et al., 2006), the LIBSVM (Chang & Lin, 2011), the ELENA Project,4 and a paper by Chapelle & Zien (2005).5
Dataset Splits	Yes	We selected all hyper-parameters with validation samples of size 20 (n V P = n V N = 10). To compute the variance of the empirical PN and PNU risks, Var[ b RPN(bg PN)] and Var[ b Rη PNU(bg PN)], we repeatedly drew additional n V P = 10 positive, n V N = 10 negative, and n V U unlabeled samples from the rest of the data set.
Hardware Specification	Yes	All experiments were carried out using a PC equipped with two 2.60GHz Intel Xeon E5-2640 v3 CPUs.
Software Dependencies	No	The paper mentions software like “Caffe” and “LIBSVM” but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	As a classiﬁer, we use the Gaussian kernel model: g(x) = Pn i=1 wi exp( x xi 2/(2σ2)), where n = n P + n N, {wi}n i=1 are the parameters, {xi}n i=1 = XP XN, and σ > 0 is the Gaussian bandwidth. The bandwidth candidates are {1/8, 1/4, 1/2, 1, 3/2, 2} median( xi xj n i,j=1). The classiﬁer trained by minimizing the empirical PN risk is denoted by bg PN. The number of labeled samples for training is 20, where the class-prior was 0.5. In all experiments, we used the squared loss for training.