reproducibilityindex.ai

Unsupervised Bilingual Lexicon Induction from Mono-Lingual Multimodal Data

Authors: Shizhe Chen, Qin Jin, Alexander Hauptmann8207-8214

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on multiple language pairs demonstrate the effectiveness of our proposed method, which substantially outperforms previous vision-based approaches without using any parallel sentences or supervision of seed word pairs.
Researcher Affiliation	Academia	1School of Information, Renmin University of China, Beijing, China 2Language Technology Institude, Carnegie Mellon University, Pittsburgh, USA
Pseudocode	Yes	Algorithm 1 Generating localized visual features.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-sourcing of its own code. It mentions a GitHub link in a footnote for a ground-truth bilingual dictionary, not for their methodology's code.
Open Datasets	Yes	For image captioning, we utilize the multi30k (Elliott et al. 2016), COCO (Chen et al. 2015) and STAIR (Yoshikawa, Shigeto, and Takeuchi 2017) datasets. For bilingual lexicon induction, we use two visual datasets: BERGSMA and MMID. The BERGSMA dataset (Bergsma and Van Durme 2011)... The MMID dataset (Hewitt et al. 2018)...
Dataset Splits	No	The caption model is trained up to 100 epochs and the best model is selected according to caption performance on the validation set. However, specific percentages or counts for this validation split are not provided, nor is a reference to a standard, detailed split.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	We use Moses SMT Toolkit to tokenize sentences... (No version number provided for Moses SMT Toolkit or any other software dependency).
Experiment Setup	Yes	For the multi-lingual caption model, we set the word embedding size and the hidden size of LSTM as 512. Adam algorithm is applied to optimize the model with learning rate of 0.0001 and batch size of 128. The caption model is trained up to 100 epochs...