Analogy Training Multilingual Encoders

Authors: Nicolas Garneau, Mareike Hartmann, Anders Sandholm, Sebastian Ruder, Ivan Vulić, Anders Søgaard12884-12892

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extract a large-scale multilingual, multi-word analogy dataset from Wikidata for diagnosing and correcting for global inconsistencies and implement a four-way Siamese BERT architecture for grounding multilingual BERT (m BERT) in Wikidata through analogy training. We show that analogy training not only improves the global consistency of m BERT, as well as the isomorphism of language-specific subspaces, but also leads to significant gains on downstream tasks such as bilingual dictionary induction and sentence retrieval.
Researcher Affiliation Collaboration 1 Universit e Laval 2 University of Copenhagen 3 Google Research 4 Deep Mind 5 University of Cambridge
Pseudocode No The paper describes algorithms and presents mathematical formulas for losses, but it does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code is available here: https://github.com/coastalcph/sentencetransformers-for-analogies
Open Datasets Yes We extract a large-scale multilingual, multi-word analogy dataset from Wikidata for diagnosing and correcting for global inconsistencies and implement a four-way Siamese BERT architecture for grounding multilingual BERT (m BERT) in Wikidata through analogy training.
Dataset Splits No The paper states: 'We provide each of these versions with standard training, validation and evaluation sections.' However, it does not provide specific percentages or counts for these splits.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU, CPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper mentions software like 'fastText' and 'BERT' variants but does not provide specific version numbers for any libraries or dependencies.
Experiment Setup No The paper describes the loss functions and the use of aliases/descriptions in training, but it does not provide concrete numerical hyperparameters such as learning rate, batch size, number of epochs, or optimizer details.