Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization
Authors: Wataru Hirota, Yoshihiko Suhara, Behzad Golshan, Wang-Chiew Tan7935-7943
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results based on several language pairs show that our specialized embeddings outperform the state-of-the-art multilingual sentence embedding model on the task of cross-lingual intent classification using only monolingual labeled data. |
| Researcher Affiliation | Collaboration | 1Osaka University, 2Megagon Labs EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 shows a single training step of EMU. |
| Open Source Code | Yes | 1Our code is available at https://github.com/megagonlabs/emu. |
| Open Datasets | Yes | ATIS (Hemphill, Godfrey, and Doddington 1990) is a publicly available corpus for spoken dialog systems and is widely used for intent classification research. ... Quora3 is a publicly available paraphrase detection dataset that contains over 400k questions with duplicate labels. |
| Dataset Splits | No | The paper mentions splitting data into training and test sets, but does not explicitly describe a separate validation set split or its size/proportion. It states, 'We split the dataset into training and test sets so that the sentences used for fine-tuning do not appear in the test set.' |
| Hardware Specification | No | The paper does not specify the hardware used for the experiments (e.g., specific CPU/GPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions 'Py Torch' and using 'the official implementation of LASER' but does not provide specific version numbers for PyTorch or other software dependencies. |
| Experiment Setup | Yes | We used an initial learning rate of 10 3 and optimized the model with Adam. We used a batch size of 16. For our proposed methods, we set α = 50 and λ = 10 4. All the models were trained for 3 epochs. The architecture of language discriminator D has two 900-dimensional fully-connected layers with a dropout rate of 0.2. The hyperparameters were γ = 10 4, k = 5, c = 0.01 respectively. The language discriminator was also optimized with Adam with an initial learning rate of 5.0 10 4. |