Zero-Shot Neural Transfer for Cross-Lingual Entity Linking

Authors: Shruti Rijhwani, Jiateng Xie, Graham Neubig, Jaime Carbonell6924-6931

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental With experiments on 9 low-resource languages and transfer through a total of 54 languages, we show that our proposed pivot-based framework improves entity linking accuracy 17% (absolute) on average over the baseline systems, for the zero-shot scenario.
Researcher Affiliation Academia Shruti Rijhwani, Jiateng Xie, Graham Neubig, Jaime Carbonell Language Technologies Institute Carnegie Mellon University {srijhwan, jiatengx, gneubig, jgc}@cs.cmu.edu
Pseudocode No The paper describes models and architectures (Figure 2, Figure 3) but does not provide any structured pseudocode or algorithm blocks.
Open Source Code Yes 1Code and data are available at https://github.com/neulab/pivotbased-entity-linking
Open Datasets Yes We primarily use Wikipedia links as a test bed because of the availability of data in many LRLs, not found in traditional EL datasets. [...] We also test our proposed PBEL method on the standard cross-lingual EL setting of linking textual mentions to KB entries. For the test set, we use annotated documents from the DARPA LORELEI program4, in two extremely low-resource languages Tigrinya and Oromo. [...] 4https://www.darpa.mil/program/ low-resource-languages-for-emergent-incidents
Dataset Splits No The paper mentions training data and test sets, but does not explicitly specify validation dataset splits (e.g., percentages or counts) or cross-validation setup required for reproduction.
Hardware Specification No The paper does not provide specific hardware details (like exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The encoder is implemented in Dy Net (Neubig et al. 2017), with a character embedding size of 64 and LSTM hidden layer size of 512. - While DyNet is mentioned, a specific version number is not provided, nor are other key software components with versions.
Experiment Setup Yes The encoder is implemented in Dy Net (Neubig et al. 2017), with a character embedding size of 64 and LSTM hidden layer size of 512. - Since we want to efficiently train a model that can rank KB entries for a given mention, we follow existing work and use negative sampling with a max-margin loss for training the encoder (Collobert et al. 2011).