reproducibilityindex.ai

Compositional Generalization for Multi-Label Text Classification: A Data-Augmentation Approach

Authors: Yuyang Chai, Zhuang Li, Jiahui Liu, Lei Chen, Fei Li, Donghong Ji, Chong Teng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results show that these models often fail to generalize to compositional concepts encountered infrequently during training, leading to inferior performance on tests with these new combinations. To address this, we introduce a data augmentation method that leverages two innovative text generation models designed to enhance the classification models capacity for compositional generalization. Our experiments show that this data augmentation approach significantly improves the compositional generalization capabilities of classification models on our benchmarks, with both generation models surpassing other text generation baselines.
Researcher Affiliation	Academia	Yuyang Chai1,, Zhuang Li2,, Jiahui Liu1, Lei Chen1, Fei Li1, Donghong Ji1, Chong Teng1, 1Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University 2Faculty of Information Technology, Monash University
Pseudocode	No	The paper describes models and equations for its proposed methods (LS-PT, LD-VAE), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our codes available at https://github.com/yychai74/LD-VAE.
Open Datasets	Yes	We conduct experiments on the compositional splits of three datasets: Sem Eval (Mohammad et al. 2018), AAPD (Yang et al. 2018), and IMDB (Maiya 2019).
Dataset Splits	Yes	After splitting, Sem Eval, a multi-label emotion classification dataset, comprises 9,530 training, 50 support, and 1,403 test examples. AAPD features academic paper abstracts annotated with subject categories from Arxiv and contains 50,481 training, 50 support, and 5,309 testing examples. IMDB provides movie reviews annotated with movie genres and includes a total of 107,944 training, 50 support, and 9,200 test samples.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU models, memory) used to run the experiments. It mentions using pre-trained models like BERT, T5, and GPT2, but not the computational resources for their own experiments.
Software Dependencies	No	The paper mentions various models and tools used (e.g., GPT2, T5, BERT, Flan-T5, GPT3.5, GRU, MLP) and cites their original papers, but it does not specify version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or other key dependencies used in their implementation.
Experiment Setup	No	The paper describes data augmentation quantities (e.g., "overgenerate 2,000, 10,000, and 24,000 examples, and then apply quality control to filter the synthetic data down to sizes of 1,000, 5,000, and 12,000"), but it lacks specific hyperparameters or system-level training settings for the models themselves (e.g., learning rate, batch size, number of epochs, optimizer details).