Cross-Lingual Transfer with Class-Weighted Language-Invariant Representations
Authors: Ruicheng Xian, Heng Ji, Han Zhao
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | we propose and evaluate a method for unsupervised transfer, called importance-weighted domain alignment (IWDA), that performs representation alignment with prior shift estimation and correction using unlabeled target language task data. Experiments demonstrate its superiority under large prior shifts, and show further performance gains when combined with existing semi-supervised learning techniques. |
| Researcher Affiliation | Academia | Ruicheng Xian, Heng Ji & Han Zhao Department of Computer Science University of Illinois Urbana-Champaign Urbana, IL 61801, USA |
| Pseudocode | No | The paper describes the algorithm and its components mathematically and conceptually, but does not include a formally structured pseudocode block or algorithm listing. |
| Open Source Code | Yes | Our code is available at https://github.com/rxian/domain-alignment. |
| Open Datasets | Yes | To study their effects on transfer performance empirically, we compare model performance of m BERT (cased) and XLM-R Large against the alignment of their class-conditioned features and prior shift of the dataset on three multilingual downstream classification tasks: sentiment analysis on the Multilingual Amazon Reviews Corpus (MARC) which covers six high-resource languages, named-entity recognition on the Wiki ANN dataset which covers 39 languages of varying linguistic properties and resources, and textual entailment on the XNLI dataset which covers 15 languages. |
| Dataset Splits | Yes | To simulate these conditions and study the effects of class prior shifts, we perform our evaluations on 500 smaller datasets subsampled from MARC with various class priors (each contains 2,500 test examples), and 700 from Wiki ANN. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions models like 'm BERT (cased) and XLM-R Large' and the 'Adam W optimizer' but does not specify version numbers for any software libraries, frameworks, or environments (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | The hyperparameter settings are included in Appendix C.2. ... Zero-Shot Fine-Tuning. Learning rate is 1e-5 with 10% warmup and a linear schedule. Batch size is 8. ... IWDA. Model learning rate is 1e-5 with 10% warmup and a linear schedule. Adversary learning rate is 5e-4 with a weight decay of 0.01, lambda gp is 10, and lambda da is 5e-3 with 10% warmup. lr iw is 5e-4, lambda iw (weight decay) is 2, and lambda iw init is 0.25. Batch size is 8 per domain (totals to 16 per step). |