Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On Gleaning Knowledge from Multiple Domains for Active Learning
Authors: Zengmao Wang, Bo Du, Lefei Zhang, Liangpei Zhang, Ruimin Hu, Dacheng Tao
IJCAI 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed method is verified with newsgroups and handwritten digits data recognition tasks, where it outperforms the state-of-the-art methods. We tested the proposed method on 20 tasks in newsgroup and handwritten digit recognition. |
| Researcher Affiliation | Collaboration | 1 School of Computer, Wuhan University 2 State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing 3National Engineering Research Center for Multimedia Software, School of Computer, Wuhan University 4 UBTech Sydney AI Institute, The School of Information Technologies, The University of Sydney |
| Pseudocode | No | The paper provides mathematical formulations and descriptions of its algorithm, but it does not include a dedicated section or figure explicitly labeled as "Pseudocode" or "Algorithm" in a structured, code-like format. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | The 20 Newsgroups data set consists of a collection of approximately 20,000 newsgroup documents, partitioned into 20 different categories. The USPS and MNIST handwritten digit data sets [Long et al., 2014] represent the various fonts of each digit from 1 to 10 using 256-dimension features normalized to the range [0, 1]. |
| Dataset Splits | Yes | For the positive samples in each task, 50% for testing, one sample as the initial labeled data, and the other near 50% as the unlabeled data for the active learning. For the negative samples in each task, we also randomly divided them into three parts: 20% for testing, 60% as the initial labeled data, and the other 20% as the unlabeled data for the active learning. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | For the classifier, without loss of generality, support vector machine (SVM) with a Gaussian kernel was adopted with the Lib SVM tool [Chang and Lin, 2011]. While Lib SVM is mentioned, a specific version number is not provided. |
| Experiment Setup | Yes | There are two important parameters in the SVM classifier: the kernel width parameter g and the penalty parameter C. For convenience, we set the two parameters with empirical values of C = 100 and g = 0.05. For a fair comparison, we adopted the same kernel parameter in all the methods. For the methods with a tradeoff parameter, we fixed it as 10, as in [Hunag and Chen, 2016]. At each iteration, five samples were selected for labeling, and we stopped the iteration loop when 20 iterations were reached. |