On Gleaning Knowledge from Multiple Domains for Active Learning
Authors: Zengmao Wang, Bo Du, Lefei Zhang, Liangpei Zhang, Ruimin Hu, Dacheng Tao
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed method is verified with newsgroups and handwritten digits data recognition tasks, where it outperforms the state-of-the-art methods. We tested the proposed method on 20 tasks in newsgroup and handwritten digit recognition. |
| Researcher Affiliation | Collaboration | 1 School of Computer, Wuhan University 2 State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing 3National Engineering Research Center for Multimedia Software, School of Computer, Wuhan University 4 UBTech Sydney AI Institute, The School of Information Technologies, The University of Sydney |
| Pseudocode | No | The paper provides mathematical formulations and descriptions of its algorithm, but it does not include a dedicated section or figure explicitly labeled as "Pseudocode" or "Algorithm" in a structured, code-like format. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | The 20 Newsgroups data set consists of a collection of approximately 20,000 newsgroup documents, partitioned into 20 different categories. The USPS and MNIST handwritten digit data sets [Long et al., 2014] represent the various fonts of each digit from 1 to 10 using 256-dimension features normalized to the range [0, 1]. |
| Dataset Splits | Yes | For the positive samples in each task, 50% for testing, one sample as the initial labeled data, and the other near 50% as the unlabeled data for the active learning. For the negative samples in each task, we also randomly divided them into three parts: 20% for testing, 60% as the initial labeled data, and the other 20% as the unlabeled data for the active learning. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | For the classifier, without loss of generality, support vector machine (SVM) with a Gaussian kernel was adopted with the Lib SVM tool [Chang and Lin, 2011]. While Lib SVM is mentioned, a specific version number is not provided. |
| Experiment Setup | Yes | There are two important parameters in the SVM classifier: the kernel width parameter g and the penalty parameter C. For convenience, we set the two parameters with empirical values of C = 100 and g = 0.05. For a fair comparison, we adopted the same kernel parameter in all the methods. For the methods with a tradeoff parameter, we fixed it as 10, as in [Hunag and Chen, 2016]. At each iteration, five samples were selected for labeling, and we stopped the iteration loop when 20 iterations were reached. |