Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization
Authors: Wataru Hirota, Yoshihiko Suhara, Behzad Golshan, Wang-Chiew Tan7935-7943
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results based on several language pairs show that our specialized embeddings outperform the state-of-the-art multilingual sentence embedding model on the task of cross-lingual intent classification using only monolingual labeled data. |
| Researcher Affiliation | Collaboration | 1Osaka University, 2Megagon Labs w-hirota@ist.osaka-u.ac.jp, {yoshi, behzad, wangchiew}@megagon.ai |
| Pseudocode | Yes | Algorithm 1 shows a single training step of EMU. |
| Open Source Code | Yes | 1Our code is available at https://github.com/megagonlabs/emu. |
| Open Datasets | Yes | ATIS (Hemphill, Godfrey, and Doddington 1990) is a publicly available corpus for spoken dialog systems and is widely used for intent classification research. ... Quora3 is a publicly available paraphrase detection dataset that contains over 400k questions with duplicate labels. |
| Dataset Splits | No | The paper mentions splitting data into training and test sets, but does not explicitly describe a separate validation set split or its size/proportion. It states, 'We split the dataset into training and test sets so that the sentences used for fine-tuning do not appear in the test set.' |
| Hardware Specification | No | The paper does not specify the hardware used for the experiments (e.g., specific CPU/GPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions 'Py Torch' and using 'the official implementation of LASER' but does not provide specific version numbers for PyTorch or other software dependencies. |
| Experiment Setup | Yes | We used an initial learning rate of 10 3 and optimized the model with Adam. We used a batch size of 16. For our proposed methods, we set α = 50 and λ = 10 4. All the models were trained for 3 epochs. The architecture of language discriminator D has two 900-dimensional fully-connected layers with a dropout rate of 0.2. The hyperparameters were γ = 10 4, k = 5, c = 0.01 respectively. The language discriminator was also optimized with Adam with an initial learning rate of 5.0 10 4. |