Compositional Generalization for Multi-Label Text Classification: A Data-Augmentation Approach
Authors: Yuyang Chai, Zhuang Li, Jiahui Liu, Lei Chen, Fei Li, Donghong Ji, Chong Teng
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results show that these models often fail to generalize to compositional concepts encountered infrequently during training, leading to inferior performance on tests with these new combinations. To address this, we introduce a data augmentation method that leverages two innovative text generation models designed to enhance the classification models capacity for compositional generalization. Our experiments show that this data augmentation approach significantly improves the compositional generalization capabilities of classification models on our benchmarks, with both generation models surpassing other text generation baselines. |
| Researcher Affiliation | Academia | Yuyang Chai1,*, Zhuang Li2,*, Jiahui Liu1, Lei Chen1, Fei Li1, Donghong Ji1, Chong Teng1, 1Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, School of Cyber Science and Engineering, Wuhan University 2Faculty of Information Technology, Monash University |
| Pseudocode | No | The paper describes models and equations for its proposed methods (LS-PT, LD-VAE), but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our codes available at https://github.com/yychai74/LD-VAE. |
| Open Datasets | Yes | We conduct experiments on the compositional splits of three datasets: Sem Eval (Mohammad et al. 2018), AAPD (Yang et al. 2018), and IMDB (Maiya 2019). |
| Dataset Splits | Yes | After splitting, Sem Eval, a multi-label emotion classification dataset, comprises 9,530 training, 50 support, and 1,403 test examples. AAPD features academic paper abstracts annotated with subject categories from Arxiv and contains 50,481 training, 50 support, and 5,309 testing examples. IMDB provides movie reviews annotated with movie genres and includes a total of 107,944 training, 50 support, and 9,200 test samples. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU models, memory) used to run the experiments. It mentions using pre-trained models like BERT, T5, and GPT2, but not the computational resources for their own experiments. |
| Software Dependencies | No | The paper mentions various models and tools used (e.g., GPT2, T5, BERT, Flan-T5, GPT3.5, GRU, MLP) and cites their original papers, but it does not specify version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or other key dependencies used in their implementation. |
| Experiment Setup | No | The paper describes data augmentation quantities (e.g., "overgenerate 2,000, 10,000, and 24,000 examples, and then apply quality control to filter the synthetic data down to sizes of 1,000, 5,000, and 12,000"), but it lacks specific hyperparameters or system-level training settings for the models themselves (e.g., learning rate, batch size, number of epochs, optimizer details). |