KnowDA: All-in-One Knowledge Mixture Model for Data Augmentation in Low-Resource NLP

Authors: Yufei Wang, Jiayi Zheng, Can Xu, Xiubo Geng, Tao Shen, Chongyang Tao, Daxin Jiang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that i) the synthetic data produced by Know DA successfully improves performance of the strong pre-trained language models (i.e., Bert, ALBert and Deberta) by a large margin on the low-resource NLP benchmark Few GLUE, Co NLL 03 and Wiki Ann; ii) Know DA successfully transfer the task knowledge to NLP tasks whose types are seen and unseen in Ko MT.
Researcher Affiliation Collaboration Yufei Wang 1 , Jiayi Zheng 2, Can Xu 3, Xiubo Geng 3, Tao Shen 3, Chongyang Tao 3, Daxin Jiang 3 Macquarie University, Sydney, Australia1, Peking University, Beijing, China2 Microsoft Corporation, Beijing, China3
Pseudocode No The paper describes procedures in natural language and illustrates them with figures, but it does not include formal pseudocode blocks or algorithm listings.
Open Source Code Yes The source code is released in https://github.com/Gary Yufei/ICLR2023_Know DA.
Open Datasets Yes We conduct low-source experiments on the Few GLUE (Schick & Sch utze, 2020), Co NLL 03 (Sang & De Meulder, 2003), and Wiki Ann (Pan et al., 2017) benchmarks. ... Similar to Ye et al. (2021), we select English monolingual datasets with open access in the Huggingface Datasets (Lhoest et al., 2021).
Dataset Splits No The paper mentions running experiments multiple times with different random seeds and data splits, and specifies the number of training examples (e.g., 32 for Few GLUE, 40 for CoNLL 03, 30 for Wiki Ann) but does not provide specific percentages or counts for a validation dataset split.
Hardware Specification Yes We train Know DA for 100k steps with a maximum sequence length of 512 and batch size 2048 in a Linux environment with 16 A100 GPU (40G). Fine-tuning Know DA is carried out only using a single A100 GPU (40G).
Software Dependencies No The paper mentions software like "T5-1.1-Large model" and "Adam as the optimizer" but does not specify version numbers for general software dependencies or libraries such as Python, PyTorch, etc.
Experiment Setup Yes We train Know DA for 100k steps with a maximum sequence length of 512 and batch size 2048 in a Linux environment with 16 A100 GPU (40G). Fine-tuning Know DA is carried out only using a single A100 GPU (40G). We use Adam as the optimizer to train Know DA. ... we simply fine-tune Know DA (i.e., updating all parameters) with batch size 12 with learning rate of 5e 6 for 500 steps.