On Gleaning Knowledge from Multiple Domains for Active Learning

Authors: Zengmao Wang, Bo Du, Lefei Zhang, Liangpei Zhang, Ruimin Hu, Dacheng Tao

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed method is verified with newsgroups and handwritten digits data recognition tasks, where it outperforms the state-of-the-art methods. We tested the proposed method on 20 tasks in newsgroup and handwritten digit recognition.
Researcher Affiliation Collaboration 1 School of Computer, Wuhan University 2 State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing 3National Engineering Research Center for Multimedia Software, School of Computer, Wuhan University 4 UBTech Sydney AI Institute, The School of Information Technologies, The University of Sydney
Pseudocode No The paper provides mathematical formulations and descriptions of its algorithm, but it does not include a dedicated section or figure explicitly labeled as "Pseudocode" or "Algorithm" in a structured, code-like format.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes The 20 Newsgroups data set consists of a collection of approximately 20,000 newsgroup documents, partitioned into 20 different categories. The USPS and MNIST handwritten digit data sets [Long et al., 2014] represent the various fonts of each digit from 1 to 10 using 256-dimension features normalized to the range [0, 1].
Dataset Splits Yes For the positive samples in each task, 50% for testing, one sample as the initial labeled data, and the other near 50% as the unlabeled data for the active learning. For the negative samples in each task, we also randomly divided them into three parts: 20% for testing, 60% as the initial labeled data, and the other 20% as the unlabeled data for the active learning.
Hardware Specification No The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No For the classifier, without loss of generality, support vector machine (SVM) with a Gaussian kernel was adopted with the Lib SVM tool [Chang and Lin, 2011]. While Lib SVM is mentioned, a specific version number is not provided.
Experiment Setup Yes There are two important parameters in the SVM classifier: the kernel width parameter g and the penalty parameter C. For convenience, we set the two parameters with empirical values of C = 100 and g = 0.05. For a fair comparison, we adopted the same kernel parameter in all the methods. For the methods with a tradeoff parameter, we fixed it as 10, as in [Hunag and Chen, 2016]. At each iteration, five samples were selected for labeling, and we stopped the iteration loop when 20 iterations were reached.