Unsupervised Interlingual Semantic Representations from Sentence Embeddings for Zero-Shot Cross-Lingual Transfer
Authors: Channy Hong, Jaeyeon Lee, Jungkwon Lee7944-7951
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For our experiments, we evaluate the effectiveness of our framework in zero-shot transfer performances on non English NLI tasks... We report the results of the zero-shot evaluations of our main model in Table 1, alongside those of the baseline model (BERT)... We also report the results of our few-shot scenario in Figure 2... |
| Researcher Affiliation | Collaboration | Channy Hong,1,2 Jaeyeon Lee,2 Jung Kwon Lee2 1Harvard University, 2Superb AI Inc. |
| Pseudocode | No | The paper describes the proposed method using text and equations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide the code implementations for training the ISR encoder as well as the task-specific classifier on top of ISR (github.com/Channy Hong/ISREncoder). |
| Open Datasets | Yes | XNLI. The primary metric we used for zero-shot transfer evaluations was the XNLI benchmark (Conneau et al. 2018)... CNLI. For our few-shot scenario, we use the Chinese NLI (CNLI) dataset1... Monolingual Corpora. For our monolingual corpora, we used Wiki Extractor2 to extract the publicly available Wikipedia dumps... |
| Dataset Splits | Yes | The XNLI benchmark provides 2500 development examples and 5000 test examples... Furthermore, XNLI provides the machine-translated versions of 392,702 MNLI training examples... CNLI dataset1 which includes 90,000 training examples, 10,000 development examples, and 10,000 test examples, all in Chinese. |
| Hardware Specification | Yes | Training took about 60 hours on a single Tesla T4 GPU. |
| Software Dependencies | No | Our model is implemented in Tensorflow... For our fixed sentence embeddings, we used the bert-as-service library (Xiao 2018) and specifically its default settings... lacks specific version numbers for these. |
| Experiment Setup | Yes | For our Generator, we used 2 upsampling layers, 4 feedforward layers, and 2 downsampling layers for each encoder and decoder (16 layers total)... The Discriminator of our model used 2 shared upsampling layers, 1 shared feedforward layer, 2 shared downsampling layers, and 1 classification layer for each adversarial loss and domain classification loss (6 layers total)... We found that λGcls of 10 worked best for training ISR (while holding all other λ to 1). We adopt the L2 distance as our distance measure between embeddings. |