Knowledge Aware Semantic Concept Expansion for Image-Text Matching

Authors: Botian Shi, Lei Ji, Pan Lu, Zhendong Niu, Nan Duan

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted on Flickr30K and MSCOCO datasets, and prove that our model achieves state-of-the-art results due to the effectiveness of incorporating the external SCG.
Researcher Affiliation Collaboration 1Beijing Institute of Technology 2Institute of Computing Technology, Chinese Academy of Science, Beijing, China 3Natural Language Computing, Microsoft Research Asia, Beijing, China 4University of California, Los Angeles
Pseudocode Yes Algorithm 1 Concept Expansion
Open Source Code No The paper provides a link for 'real showcases of retrieval' in the case study section, which are visualizations/results, not the general source code for the methodology. 'please check out this link: https://goo.gl/izcSN9.'
Open Datasets Yes Visual Genome [Krishna et al., 2017], MSCOCO [Lin et al., 2014], Flickr30K [Young et al., 2014]
Dataset Splits Yes MSCOCO: We follow [Karpathy and Fei-Fei, 2015] to prepare the training, validation and test dataset by splitting all images to 113,287 (for training), 5,000 (for validation) and 5,000 (for test). Flickr30K: We followed the split in [Karpathy and Fei-Fei, 2015] and [Faghri et al., 2017] that used 1,000 images for testing and 1,000 images for validation and the rest of them (28,783 images) for training.
Hardware Specification No The paper mentions models like LSTM and VGG19, but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using LSTM, VGG19, ImageNet, and Adam Optimizer, but does not provide specific version numbers for any software libraries or frameworks (e.g., PyTorch, TensorFlow, CUDA).
Experiment Setup Yes The dimension of concept-enhanced image representation and text representation is e = 512. We used λ1 = 5.0, λ2 = 1.0, λ3 = 1.5 and λ4 = 0.05 as the hyper-parameters of loss function. An Adam Optimizer was adopted to optimize model s parameters.