Semi-Supervised Active Learning with Cross-Class Sample Transfer

Authors: Yuchen Guo, Guiguang Ding, Yue Gao, Jianmin Wang

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three datasets verify the efficacy of the proposed method. We carry out comprehensive empirical analysis on three benchmark datasets. The results show that the proposed CC-SS-AL requires much fewer labeled samples in the target domain than the conventional SS-AL methods to achieve the same accuracy, which validates its effecacy.
Researcher Affiliation Academia Tsinghua National Laboratory for Information Science and Technology (TNList) School of Software, Tsinghua University, Beijing 100084, China {yuchen.w.guo,kevin.gaoy}@gmail.com,{dinggg,jimwang}@tsinghua.edu.cn
Pseudocode Yes Algorithm 1 CC-SS-AL Input: Source domain data Ds, target domain pool Dp; Label semantic vector ac for 8c 2 Cs [ Ct; Output: Classifiers wc for target domain, 8c 2 Ct; 1: Initialize L by random seed, U = {1, ..., np}\L; 2: for iter = 1 : max iter do 3: Construct feature semantic embedding P by Eq. (2); 4: Initialize pseudo labeled set L = ;; 5: for c 2 Ct do 6: Compute sample-class similarity by Eq. (3); 7: Select samples Sc = {j|scj = 1} for c by Eq. (4); 8: Assign pseudo label c(xsj2Sc) = c, L = L [ Sc; 9: end for 10: Train Lap SVM parameters wc for 8c 2 Ct by Eq. (8); 11: Select top ranked samples S by Eq. (10) for labeling; 12: Update L = L [ S and U = U\S; 13: end for 14: Return wc, 8c 2 Ct;
Open Source Code No The paper does not provide an explicit statement or link to its own open-source code.
Open Datasets Yes To demonstrate the effectiveness of the proposed method, we conduct experiments on three benchmark datasets. The first is CIFAR10 [Krizhevsky, 2009]... The second dataset is Animals with Attributes (Aw A) [Lampert et al., 2014]... The third dataset is a Pascal-a Yahoo (a PY) dataset [Farhadi et al., 2009].
Dataset Splits Yes we split the data in target domain equally into two parts, and one part acts as the pool Dp where the methods select samples for human labeling, and the other part is the test set Dt. To determine the model parameters for each model, e.g., the parameter C for SVM, the cross-validation (CV) strategy is employed here.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, processors, or memory used for running experiments.
Software Dependencies No The paper mentions software like Caffe, MATLAB's quadprog function, and Python, but does not provide specific version numbers for any of them.
Experiment Setup Yes To determine the model parameters for each model, e.g., the parameter C for SVM, the cross-validation (CV) strategy is employed here. Specifically, for three baselines, we use the labeled source domain for CV. The parameter C for SVM and Cg for Laplacian regularization are chosen from {10-3, 10-2, ..., 102}. Following Guo et al.[2016], we use cross-class CV for our method. For CIFAR10 which has 8 classes in source domain, we use 2 classes to simulate the target domain and the other as the source domain. The other two datasets are processed in similar way. In CV, C1 and C2 in Eq. (4), C and Cg in Eq. (8) are selected from {0.1, 1, 10}. In addition, we simply set β in Eq. (4) and λ in Eq. (10) to 1. In each iteration, 2, 10, and 12 samples are selected for labeling for CIFAR10, Aw A, and a PY, respectively.