reproducibilityindex.ai

Emu: Enhancing Multilingual Sentence Embeddings with Semantic Specialization

Authors: Wataru Hirota, Yoshihiko Suhara, Behzad Golshan, Wang-Chiew Tan7935-7943

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results based on several language pairs show that our specialized embeddings outperform the state-of-the-art multilingual sentence embedding model on the task of cross-lingual intent classiﬁcation using only monolingual labeled data.
Researcher Affiliation	Collaboration	1Osaka University, 2Megagon Labs w-hirota@ist.osaka-u.ac.jp, {yoshi, behzad, wangchiew}@megagon.ai
Pseudocode	Yes	Algorithm 1 shows a single training step of EMU.
Open Source Code	Yes	1Our code is available at https://github.com/megagonlabs/emu.
Open Datasets	Yes	ATIS (Hemphill, Godfrey, and Doddington 1990) is a publicly available corpus for spoken dialog systems and is widely used for intent classiﬁcation research. ... Quora3 is a publicly available paraphrase detection dataset that contains over 400k questions with duplicate labels.
Dataset Splits	No	The paper mentions splitting data into training and test sets, but does not explicitly describe a separate validation set split or its size/proportion. It states, 'We split the dataset into training and test sets so that the sentences used for ﬁne-tuning do not appear in the test set.'
Hardware Specification	No	The paper does not specify the hardware used for the experiments (e.g., specific CPU/GPU models, memory, or cloud instance types).
Software Dependencies	No	The paper mentions 'Py Torch' and using 'the ofﬁcial implementation of LASER' but does not provide specific version numbers for PyTorch or other software dependencies.
Experiment Setup	Yes	We used an initial learning rate of 10 3 and optimized the model with Adam. We used a batch size of 16. For our proposed methods, we set α = 50 and λ = 10 4. All the models were trained for 3 epochs. The architecture of language discriminator D has two 900-dimensional fully-connected layers with a dropout rate of 0.2. The hyperparameters were γ = 10 4, k = 5, c = 0.01 respectively. The language discriminator was also optimized with Adam with an initial learning rate of 5.0 10 4.