Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model

Authors: Juntao Li, Ruidan He, Hai Ye, Hwee Tou Ng, Lidong Bing, Rui Yan

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our proposed method achieves significant performance improvements over the state-of-the-art pretrained crosslingual language model in the CLCD setting.
Researcher Affiliation Collaboration Juntao Li1,2 , Ruidan He3 , Hai Ye2 , Hwee Tou Ng2 , Lidong Bing3 and Rui Yan1 1Center for Data Science, Academy for Advanced Interdisciplinary Studies, Peking University 2Department of Computer Science, National University of Singapore 3DAMO Academy, Alibaba Group
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to its source code, nor does it include a specific repository link or an explicit code release statement for the methodology described.
Open Datasets Yes We conduct experiments on the multi-lingual and multi-domain Amazon review dataset [Prettenhofer and Stein, 2010]
Dataset Splits Yes There are a training set and a test set for each domain in each language and both consist of 1,000 positive reviews and 1,000 negative reviews. ... We utilize 100 labeled data in the target language and target domain as the validation set, which is used for hyperparameter tuning and model selection during training.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using specific models like XLM and optimizers like Adam, but it does not provide specific version numbers for software dependencies or libraries used in their implementation.
Experiment Setup Yes The hidden dimension of XLM is 1024. The input and output dimensions of the feedforward layers in both Fs and Fp are 1024. ... All trainable parameters are initialized from a uniform distribution [ 0.1, 0.1]. ... both UFD and the task-specific module are optimized by Adam [Kingma and Ba, 2014] with a learning rate of 1 10 4. The batch size of training UFD and the task-specific module are set to 16 and 8, respectively. The weights α, β, γ in Equation (6) are set to 1, 0.3, and 1, respectively.