Unsupervised Interlingual Semantic Representations from Sentence Embeddings for Zero-Shot Cross-Lingual Transfer

Authors: Channy Hong, Jaeyeon Lee, Jungkwon Lee7944-7951

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For our experiments, we evaluate the effectiveness of our framework in zero-shot transfer performances on non English NLI tasks... We report the results of the zero-shot evaluations of our main model in Table 1, alongside those of the baseline model (BERT)... We also report the results of our few-shot scenario in Figure 2...
Researcher Affiliation Collaboration Channy Hong,1,2 Jaeyeon Lee,2 Jung Kwon Lee2 1Harvard University, 2Superb AI Inc.
Pseudocode No The paper describes the proposed method using text and equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We provide the code implementations for training the ISR encoder as well as the task-specific classifier on top of ISR (github.com/Channy Hong/ISREncoder).
Open Datasets Yes XNLI. The primary metric we used for zero-shot transfer evaluations was the XNLI benchmark (Conneau et al. 2018)... CNLI. For our few-shot scenario, we use the Chinese NLI (CNLI) dataset1... Monolingual Corpora. For our monolingual corpora, we used Wiki Extractor2 to extract the publicly available Wikipedia dumps...
Dataset Splits Yes The XNLI benchmark provides 2500 development examples and 5000 test examples... Furthermore, XNLI provides the machine-translated versions of 392,702 MNLI training examples... CNLI dataset1 which includes 90,000 training examples, 10,000 development examples, and 10,000 test examples, all in Chinese.
Hardware Specification Yes Training took about 60 hours on a single Tesla T4 GPU.
Software Dependencies No Our model is implemented in Tensorflow... For our fixed sentence embeddings, we used the bert-as-service library (Xiao 2018) and specifically its default settings... lacks specific version numbers for these.
Experiment Setup Yes For our Generator, we used 2 upsampling layers, 4 feedforward layers, and 2 downsampling layers for each encoder and decoder (16 layers total)... The Discriminator of our model used 2 shared upsampling layers, 1 shared feedforward layer, 2 shared downsampling layers, and 1 classification layer for each adversarial loss and domain classification loss (6 layers total)... We found that λGcls of 10 worked best for training ISR (while holding all other λ to 1). We adopt the L2 distance as our distance measure between embeddings.