reproducibilityindex.ai

Cross-Lingual Transfer with Class-Weighted Language-Invariant Representations

Authors: Ruicheng Xian, Heng Ji, Han Zhao

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	we propose and evaluate a method for unsupervised transfer, called importance-weighted domain alignment (IWDA), that performs representation alignment with prior shift estimation and correction using unlabeled target language task data. Experiments demonstrate its superiority under large prior shifts, and show further performance gains when combined with existing semi-supervised learning techniques.
Researcher Affiliation	Academia	Ruicheng Xian, Heng Ji & Han Zhao Department of Computer Science University of Illinois Urbana-Champaign Urbana, IL 61801, USA
Pseudocode	No	The paper describes the algorithm and its components mathematically and conceptually, but does not include a formally structured pseudocode block or algorithm listing.
Open Source Code	Yes	Our code is available at https://github.com/rxian/domain-alignment.
Open Datasets	Yes	To study their effects on transfer performance empirically, we compare model performance of m BERT (cased) and XLM-R Large against the alignment of their class-conditioned features and prior shift of the dataset on three multilingual downstream classiﬁcation tasks: sentiment analysis on the Multilingual Amazon Reviews Corpus (MARC) which covers six high-resource languages, named-entity recognition on the Wiki ANN dataset which covers 39 languages of varying linguistic properties and resources, and textual entailment on the XNLI dataset which covers 15 languages.
Dataset Splits	Yes	To simulate these conditions and study the effects of class prior shifts, we perform our evaluations on 500 smaller datasets subsampled from MARC with various class priors (each contains 2,500 test examples), and 700 from Wiki ANN.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions models like 'm BERT (cased) and XLM-R Large' and the 'Adam W optimizer' but does not specify version numbers for any software libraries, frameworks, or environments (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	The hyperparameter settings are included in Appendix C.2. ... Zero-Shot Fine-Tuning. Learning rate is 1e-5 with 10% warmup and a linear schedule. Batch size is 8. ... IWDA. Model learning rate is 1e-5 with 10% warmup and a linear schedule. Adversary learning rate is 5e-4 with a weight decay of 0.01, lambda gp is 10, and lambda da is 5e-3 with 10% warmup. lr iw is 5e-4, lambda iw (weight decay) is 2, and lambda iw init is 0.25. Batch size is 8 per domain (totals to 16 per step).