Enhancing Cross-lingual Transfer by Manifold Mixup
Authors: Huiyun Yang, Huadong Chen, Hao Zhou, Lei Li
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the XTREME benchmark show X-MIXUP achieves 1.8% performance gains on multiple text understanding tasks, compared with strong baselines, and reduces the cross-lingual representation discrepancy significantly. |
| Researcher Affiliation | Collaboration | Huiyun Yang1, Huadong Chen1, Hao Zhou 1, Lei Li2 1Byte Dance AI Lab 2University of California, Santa Barbara |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. Method details are described in paragraph text and mathematical equations. |
| Open Source Code | Yes | Corresponding author. Code is available at https://github.com/yhy1117/X-Mixup. |
| Open Datasets | Yes | We utilize the translate-train and translate-test data from the XTREME repo5, which also provide the pseudo-label of translate-train data for classification tasks and question answering tasks. The rest translation data are from Google Translate6. 5https://github.com/google-research/xtreme. |
| Dataset Splits | Yes | We select XNLI, POS, and MLQA as representative tasks to search for the best hyper-parameters. The final model is selected based on the averaged performance of all languages on the dev set. |
| Hardware Specification | Yes | For all tasks, we fine-tune on 8 Nvidia V100-32GB GPU cards with the batch size 64. |
| Software Dependencies | No | The paper mentions using 'Huggingface Transformers' as the backbone model but does not provide specific version numbers for it or any other software dependencies. |
| Experiment Setup | Yes | We perform grid search over the balance training parameter α and learning rate from [0.2, 0.4, 0.6, 0.8] and [3e-6, 5e-6, 2e-5, 3e-5]. We also search for the best manifold mixup layer from [1, 4, 8, 12, 16, 20, 24]. |