Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improving Zero-Shot Cross-Lingual Transfer via Progressive Code-Switching
Authors: Zhuoran Li, Chunming Hu, Junfan Chen, Zhijun Chen, Xiaohui Guo, Richong Zhang
IJCAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show our model achieves state-of-the-art results on three different zero-shot cross-lingual transfer tasks across ten languages. |
| Researcher Affiliation | Academia | 1SKLSDE, School of Computer Science and Engineering, Beihang University, Beijing, China 2School of Software, Beihang University, Beijing, China 3Hangzhou Innovation Institute, Beihang University, Hangzhou, China EMAIL, EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or a link to the open-source code for the methodology described. |
| Open Datasets | Yes | To comprehensively evaluate our proposed method, we conduct experiments on three types of cross-lingual transfer tasks with three widely used datasets. (1) For paraphrase identification, we employ PAWS-X dataset [Yang et al., 2019] containing seven languages. ... (2) For document classification, we employ MLDoc [Schwenk and Li, 2018] as our document classification dataset... (3) For spoken language understanding, we use the cross-lingual task-oriented dialogue dataset (XTOD) [Schuster et al., 2019]... |
| Dataset Splits | Yes | Table 1: Summary statistics of datasets. Dataset #Lang. #Train #Dev. #Test #Labels Metric PAWS-X 7 49,401 2,000 2,000 2 Acc. MLDoc 8 10,000 1,000 2,000 4 Acc. XTOD 3 30,521 4,181 2,368 12/11 Acc./F1 |
| Hardware Specification | Yes | All models are trained on a single Tesla V100 32GB GPU. |
| Software Dependencies | No | The paper mentions 'Hugging Face Transformer' as a backbone model and 'Adam W' as an optimizer but does not specify version numbers for these software dependencies. |
| Experiment Setup | Yes | We set the batch size to 16 or 64, the maximum sequence length to 128, and the dropout rate to 0.1, and we use Adam W as the optimizer. We select the best learning rate from {5e-6, 1e-5} for the encoder and {1e-3, 1e-5} for the task-specific network layer. As for the scheduler, we initialize τ = 0, which linearly increases as the stage increases. |