Analogy Training Multilingual Encoders
Authors: Nicolas Garneau, Mareike Hartmann, Anders Sandholm, Sebastian Ruder, Ivan Vulić, Anders Søgaard12884-12892
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extract a large-scale multilingual, multi-word analogy dataset from Wikidata for diagnosing and correcting for global inconsistencies and implement a four-way Siamese BERT architecture for grounding multilingual BERT (m BERT) in Wikidata through analogy training. We show that analogy training not only improves the global consistency of m BERT, as well as the isomorphism of language-specific subspaces, but also leads to significant gains on downstream tasks such as bilingual dictionary induction and sentence retrieval. |
| Researcher Affiliation | Collaboration | 1 Universit e Laval 2 University of Copenhagen 3 Google Research 4 Deep Mind 5 University of Cambridge |
| Pseudocode | No | The paper describes algorithms and presents mathematical formulas for losses, but it does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code is available here: https://github.com/coastalcph/sentencetransformers-for-analogies |
| Open Datasets | Yes | We extract a large-scale multilingual, multi-word analogy dataset from Wikidata for diagnosing and correcting for global inconsistencies and implement a four-way Siamese BERT architecture for grounding multilingual BERT (m BERT) in Wikidata through analogy training. |
| Dataset Splits | No | The paper states: 'We provide each of these versions with standard training, validation and evaluation sections.' However, it does not provide specific percentages or counts for these splits. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU, CPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like 'fastText' and 'BERT' variants but does not provide specific version numbers for any libraries or dependencies. |
| Experiment Setup | No | The paper describes the loss functions and the use of aliases/descriptions in training, but it does not provide concrete numerical hyperparameters such as learning rate, batch size, number of epochs, or optimizer details. |