C2C-GenDA: Cluster-to-Cluster Generation for Data Augmentation of Slot Filling

Authors: Yutai Hou, Sanyuan Chen, Wanxiang Che, Cheng Chen, Ting Liu13027-13035

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on ATIS and Snips datasets show that instances augmented by C2C-Gen DA improve slot filling by 7.99 (11.9% ) and 5.76 (13.6% ) F-scores respectively, when there are only hundreds of training utterances.
Researcher Affiliation Academia Yutai Hou*, Sanyuan Chen*, Wanxiang Che , Cheng Chen, Ting Liu Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, China {ythou, sychen, car, tliu}@ir.hit.edu.cn, 170400202@stu.hit.edu.cn
Pseudocode Yes Algorithm 1: Dispersed Cluster Pairing
Open Source Code Yes Code: https://github.com/Sanyuan-Chen/C2C-DA.
Open Datasets Yes We conduct experiments on ATIS and Snips datasets. ATIS (Hemphill, Godfrey, and Doddington 1990) is extensively used for slot filling and provides a well-founded comparison for data augmentation methods. [...] Snips (Coucke et al. 2018) dataset is collected from the Snips personal voice assistant.
Dataset Splits Yes We use a development set of 500 instances. [...] We use another 700 utterances as the development set.
Hardware Specification No The paper mentions using a transformer model and GPT-2, but does not specify any hardware details like GPU/CPU models used for training or inference.
Software Dependencies No The paper mentions software components like 'transformer implemented by Wolf et al. (2019)', 'GPT-2', 'Adam W (Loshchilov and Hutter 2019) optimizer', 'Bi-LSTM', 'GloVe (Pennington, Socher, and Manning 2014)', and 'Adam (Kingma and Ba 2015)', but it does not provide specific version numbers for any of these software components or libraries.
Experiment Setup Yes We used Adam W (Loshchilov and Hutter 2019) optimizer with initial learning rate 6.25e-5 or 5e-5 for training. We varied λ in {0.1, 0.02, 0.01, 0.002, 0.001} and set γ as 1.0. [...] The dimension of word embeddings and hidden states was set to 300 and 128, respectively. We used Glo Ve (Pennington, Socher, and Manning 2014) to initialize word embedding. We varied training batch size in {16, 128}, set dropout rate to 0.5, and trained the model with Adam as suggested by Kingma and Ba (2015).