reproducibilityindex.ai

Active Learning with Query Generation for Cost-Effective Text Classification

Authors: Yi-Fan Yan, Sheng-Jun Huang, Shaoyi Chen, Meng Liao, Jin Xu6583-6590

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on different datasets demonstrate that the proposed approach can effectively improve the classiﬁcation performance while signiﬁcantly reduce the annotation cost.
Researcher Affiliation	Collaboration	Yi-Fan Yan,1,2 Sheng-Jun Huang,1,2 Shaoyi Chen,3 Meng Liao,3 Jin Xu3 1College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106 2MIIT Key Laboratory of Pattern Analysis and Machine Intelligence 3Data Quality Team, We Chat, Tencent Inc., China
Pseudocode	Yes	Algorithm 1 The ALQG Algorithm
Open Source Code	No	The paper does not provide an explicit statement about the release of its source code or a link to a code repository for the described methodology.
Open Datasets	Yes	IT-vs-Learning, Healthy-vs-Auto, Culturevs-Military are three Chinese public datasets for binary text classiﬁcation1 (Wang et al. 2008). World-vs-Sports is an English public dataset2 (Zhang, Zhao, and Lecun 2015). 1http://www.sogou.com/labs/resource/list pingce.php 2https://github.com/mhjabreel/Char CNN/tree/master/data
Dataset Splits	No	The paper specifies a 20% test set and 1% initial labeled data from the remaining 80%, but does not explicitly mention a separate validation split.
Hardware Specification	No	The paper mentions using the 'tensorﬂow framework' but does not specify any details about the hardware (e.g., CPU, GPU models, memory) used for experiments.
Software Dependencies	No	The paper mentions 'tensorﬂow framework', 'Jieba text segmentation toolbox', and 'pretrained model' for word embedding, but none include specific version numbers.
Experiment Setup	Yes	In the experiments, we set the hyperparameter λ = 5 as default for all datasets, the batch size is set as b = 20. A single hidden-layer neural network is employed for query generation. The number of nodes is 10 for the input layer, and 200 for the hidden layer. For the output layer, the number of nodes equals to the dimensionality of the feature space.