reproducibilityindex.ai

CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot Cross-Lingual NLP

Authors: Libo Qin, Minheng Ni, Yue Zhang, Wanxiang Che

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on ﬁve tasks with 19 languages show that our method leads to signiﬁcantly improved performances for all the tasks compared with m BERT.
Researcher Affiliation	Academia	1Research Center for Social Computing and Information Retrieval Harbin Institute of Technology, China 2School of Engineering, Westlake University, China 3Institute of Advanced Technology, Westlake Institute for Advanced Study
Pseudocode	Yes	Algorithm 1 shows pseudocode for the multi-lingual codeswitching code augmentation process, where lines 1-2 denote the sentence selection step, lines 3-6 denote the word selection and lines 7-11 denote the replacement selection step.
Open Source Code	Yes	All codes are publicly available at: https://github.com/kodenii/Co SDA-ML.
Open Datasets	Yes	We use XNLI [Conneau et al., 2018], which covers 15 languages for natural language inference. We use the Ope NER English and Spanish datasets, and the Multi Booked Catalan and Basque datasets. We use MLDoc [Schwenk and Li, 2018] for document classiﬁcation. ...we use the Multilingual WOZ 2.0 dataset [Mrkˇsi c et al., 2017]... We follow Schuster et al. [2019b] and use the cross-lingual spoken language understanding dataset...
Dataset Splits	Yes	In ﬁne-tuning, we select the best hyperparameters by searching a combination of batch size, learning rate, the number of ﬁne-tuning epochs and replacement ratio with the following range: learning rate {1 10 6, 2 10 6, 3 10 6, 4 10 6, 5 10 6, 1 10 5}; batch size {8, 16, 32}; number of epochs: {4, 10, 20, 40, 100}; token and sentence replacement ratio: {0.4, 0.5, 0.6, 0.8, 0.9, 1.0}. Note that the best model are saved by development performance in the English.
Hardware Specification	No	No specific hardware details (GPU/CPU models, processor types, or memory) used for running experiments were mentioned.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) were listed.
Experiment Setup	Yes	In ﬁne-tuning, we select the best hyperparameters by searching a combination of batch size, learning rate, the number of ﬁne-tuning epochs and replacement ratio with the following range: learning rate {1 10 6, 2 10 6, 3 10 6, 4 10 6, 5 10 6, 1 10 5}; batch size {8, 16, 32}; number of epochs: {4, 10, 20, 40, 100}; token and sentence replacement ratio: {0.4, 0.5, 0.6, 0.8, 0.9, 1.0}.