Enhancing Bilingual Lexicon Induction via Bi-directional Translation Pair Retrieving

Authors: Qiuyu Ding, Hailong Cao, Tiejun Zhao

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a benchmark dataset of BLI, our proposed method achieves competitive performance compared to existing state-of-the-art (SOTA) methods. It demonstrates effectiveness and robustness across six experimental languages, including similar language pairs and distant language pairs, under both supervised and unsupervised settings. ... To evaluate the effectiveness of our method, we perform a comprehensive set of BLI experiments on the standard BLI benchmark
Researcher Affiliation Academia Qiuyu Ding, Hailong Cao*, Tiejun Zhao Harbin Institute of Technology qiuyuding@stu.hit.edu.cn, caohailong@hit.edu.cn, tjzhao@hit.edu.cn
Pseudocode No The paper describes the method using text and a diagram, but it does not include a dedicated pseudocode block or algorithm listing.
Open Source Code No The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We use the widely used MUSE dataset (Lample et al. 2018), which consists of 300-dim embeddings pre-trained with Fast Text (Bojanowski et al. 2017) which is trained on the monolingual corpora of full Wikipedias for each language, and the vocabularies are trimmed to the 200k most frequent words. We also employ the test sets released by (Lample et al. 2018) that are widely used in BLI evaluations.
Dataset Splits No The paper explicitly mentions using '5k translation pairs are used as seed lexicon D0' for training and 'test sets' for evaluation, but it does not specify a separate 'validation' split or its details for reproduction.
Hardware Specification Yes All experiments are performed on a single Nvidia RTX A6000.
Software Dependencies No The paper mentions general software like 'Fast Text' and several baseline BLI systems, but it does not provide specific version numbers for software dependencies used in its own implementation.
Experiment Setup Yes We select best hyperparameters by searching a combination of λ, n, m with the following range: λ: {0.05, 0.1, . . . , 1.0} with 0.05 step size; n, m: {3, 4, . . . , 20} with 1 step size.