CoLAL: Co-learning Active Learning for Text Classification

Authors: Linh Le, Genghong Zhao, Xia Zhang, Guido Zuccon, Gianluca Demartini

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through theoretical analysis and experimental validation, we reveal that the integration of noisy labels into the peer model effectively identifies target model s potential inaccuracies. We evaluated the Co LAL method across seven benchmark datasets: four text datasets (AGNews, DBPedia, Pub Med, SST-2) and text-based stateof-the-art (SOTA) baselines, and three image datasets (CIFAR100, MNIST, Open ML-155) and computer vision SOTA baselines. The results show that our Co LAL method significantly outperforms existing SOTA in text-based AL, and is competitive with SOTA image-based AL techniques.
Researcher Affiliation Collaboration Linh Le1, Genghong Zhao2, Xia Zhang3, Guido Zuccon1, Gianluca Demartini1 1The University of Queensland 2Neusoft Research of Intelligent Healthcare Technology, Co. Ltd. 3Neusoft Corporation
Pseudocode Yes Algorithm 1: Co LAL algorithm
Open Source Code No No explicit statement or link indicating the release of the authors' own source code for the described methodology.
Open Datasets Yes In this study, we consider four benchmark text classification datasets that were used to evaluate SOTA baselines (Yu et al. 2022; Margatina et al. 2021). The first dataset, AGNews (Zhang, Zhao, and Lecun 2015)... The second dataset, DBPedia (Zhang, Zhao, and Lecun 2015)... The third dataset, Pubmed (Dernoncourt and Lee 2017)... Finally, the fourth dataset, SST2 (Socher et al. 2021)... We also consider three benchmark datasets to evaluate image classification SOTA baselines, as in (Parvaneh et al. 2022). These are CIFAR100 (Krizhevsky 2009), MNIST (Le Cun et al. 1998) and Open ML 1. CIFAR100 includes 100 classes of 32x32 coloured images, featuring 50,000 images for training and 10,000 images for testing. MNIST includes 10 classes of 28x28 binary images depicting handwritten single digits, with a training set of 50,000 images and a test set of 10,000 images. Additionally, we have selected the Open ML-155 dataset, which consists of 9 classes of metadata samples, totaling 50,000 training samples and 10,000 test samples, as configured in (Parvaneh et al. 2022).
Dataset Splits Yes The first dataset, AGNews (Zhang, Zhao, and Lecun 2015), is focused on news topic classification and comprises 4 classes, with 119,000 training samples, 1,000 development samples, and 7,600 test samples. The second dataset, DBPedia (Zhang, Zhao, and Lecun 2015), is designed for Wikipedia topic classification and encompasses 14 classes, with 280,000 training samples, 1,000 development samples, and 70,000 test samples. The third dataset, Pubmed (Dernoncourt and Lee 2017), is used for medical abstract classification and includes 5 classes, with 180,000 training samples, 1,000 development samples, and 30,100 test samples. Finally, the fourth dataset, SST2 (Socher et al. 2021), is used for sentiment analysis and contains 2 classes, with 60,600 training samples, 800 development samples, and 1,800 test samples. ... Due to the impracticality of large development sets in low-resource settings (Kann, Cho, and Bowman 2019), we limit the size of the development set to 1,000, which is the same as the labeling budget.
Hardware Specification Yes Our experiments are executed with 5 different random seeds on a GPU cluster with 16GB n Vidia Tesla V100 GPUs.
Software Dependencies No The paper mentions using "Ro BERTa-base (Liu et al. 2019b) from the Hugging Face codebase (Wolf et al. 2020)" and "Sci BERT (Beltagy, Lo, and Cohan 2019)", along with the "Sequence Classification backbone from Hugging Face". However, specific version numbers for the Hugging Face codebase, PyTorch, or other key software dependencies are not provided.
Experiment Setup Yes The configuration for the target model includes training for 15 epochs, using a batch size of 8, a learning rate of 2e-5, and a weight decay of 1e-8. Additionally, we utilize the Sequence Classification backbone from Hugging Face for our classification tasks, ensuring compatibility and consistency across experiments. ... While the peer model fξ is trained with a high number of unlabeled data for which we generate noisy labels and a small number of epochs (2 epochs), the target model fψ is trained on a small amount of good quality labeled data and a higher number of training epochs (15 epochs).