Enhancing Cross-lingual Transfer by Manifold Mixup

Authors: Huiyun Yang, Huadong Chen, Hao Zhou, Lei Li

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the XTREME benchmark show X-MIXUP achieves 1.8% performance gains on multiple text understanding tasks, compared with strong baselines, and reduces the cross-lingual representation discrepancy significantly.
Researcher Affiliation Collaboration Huiyun Yang1, Huadong Chen1, Hao Zhou 1, Lei Li2 1Byte Dance AI Lab 2University of California, Santa Barbara
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. Method details are described in paragraph text and mathematical equations.
Open Source Code Yes Corresponding author. Code is available at https://github.com/yhy1117/X-Mixup.
Open Datasets Yes We utilize the translate-train and translate-test data from the XTREME repo5, which also provide the pseudo-label of translate-train data for classification tasks and question answering tasks. The rest translation data are from Google Translate6. 5https://github.com/google-research/xtreme.
Dataset Splits Yes We select XNLI, POS, and MLQA as representative tasks to search for the best hyper-parameters. The final model is selected based on the averaged performance of all languages on the dev set.
Hardware Specification Yes For all tasks, we fine-tune on 8 Nvidia V100-32GB GPU cards with the batch size 64.
Software Dependencies No The paper mentions using 'Huggingface Transformers' as the backbone model but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup Yes We perform grid search over the balance training parameter α and learning rate from [0.2, 0.4, 0.6, 0.8] and [3e-6, 5e-6, 2e-5, 3e-5]. We also search for the best manifold mixup layer from [1, 4, 8, 12, 16, 20, 24].