reproducibilityindex.ai

Semi-Supervised Active Learning with Cross-Class Sample Transfer

Authors: Yuchen Guo, Guiguang Ding, Yue Gao, Jianmin Wang

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on three datasets verify the efﬁcacy of the proposed method. We carry out comprehensive empirical analysis on three benchmark datasets. The results show that the proposed CC-SS-AL requires much fewer labeled samples in the target domain than the conventional SS-AL methods to achieve the same accuracy, which validates its effecacy.
Researcher Affiliation	Academia	Tsinghua National Laboratory for Information Science and Technology (TNList) School of Software, Tsinghua University, Beijing 100084, China {yuchen.w.guo,kevin.gaoy}@gmail.com,{dinggg,jimwang}@tsinghua.edu.cn
Pseudocode	Yes	Algorithm 1 CC-SS-AL Input: Source domain data Ds, target domain pool Dp; Label semantic vector ac for 8c 2 Cs [ Ct; Output: Classifiers wc for target domain, 8c 2 Ct; 1: Initialize L by random seed, U = {1, ..., np}\L; 2: for iter = 1 : max iter do 3: Construct feature semantic embedding P by Eq. (2); 4: Initialize pseudo labeled set L = ;; 5: for c 2 Ct do 6: Compute sample-class similarity by Eq. (3); 7: Select samples Sc = {j\|scj = 1} for c by Eq. (4); 8: Assign pseudo label c(xsj2Sc) = c, L = L [ Sc; 9: end for 10: Train Lap SVM parameters wc for 8c 2 Ct by Eq. (8); 11: Select top ranked samples S by Eq. (10) for labeling; 12: Update L = L [ S and U = U\S; 13: end for 14: Return wc, 8c 2 Ct;
Open Source Code	No	The paper does not provide an explicit statement or link to its own open-source code.
Open Datasets	Yes	To demonstrate the effectiveness of the proposed method, we conduct experiments on three benchmark datasets. The ﬁrst is CIFAR10 [Krizhevsky, 2009]... The second dataset is Animals with Attributes (Aw A) [Lampert et al., 2014]... The third dataset is a Pascal-a Yahoo (a PY) dataset [Farhadi et al., 2009].
Dataset Splits	Yes	we split the data in target domain equally into two parts, and one part acts as the pool Dp where the methods select samples for human labeling, and the other part is the test set Dt. To determine the model parameters for each model, e.g., the parameter C for SVM, the cross-validation (CV) strategy is employed here.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, processors, or memory used for running experiments.
Software Dependencies	No	The paper mentions software like Caffe, MATLAB's quadprog function, and Python, but does not provide specific version numbers for any of them.
Experiment Setup	Yes	To determine the model parameters for each model, e.g., the parameter C for SVM, the cross-validation (CV) strategy is employed here. Speciﬁcally, for three baselines, we use the labeled source domain for CV. The parameter C for SVM and Cg for Laplacian regularization are chosen from {10-3, 10-2, ..., 102}. Following Guo et al.[2016], we use cross-class CV for our method. For CIFAR10 which has 8 classes in source domain, we use 2 classes to simulate the target domain and the other as the source domain. The other two datasets are processed in similar way. In CV, C1 and C2 in Eq. (4), C and Cg in Eq. (8) are selected from {0.1, 1, 10}. In addition, we simply set β in Eq. (4) and λ in Eq. (10) to 1. In each iteration, 2, 10, and 12 samples are selected for labeling for CIFAR10, Aw A, and a PY, respectively.