Knowledge Aware Semantic Concept Expansion for Image-Text Matching
Authors: Botian Shi, Lei Ji, Pan Lu, Zhendong Niu, Nan Duan
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted on Flickr30K and MSCOCO datasets, and prove that our model achieves state-of-the-art results due to the effectiveness of incorporating the external SCG. |
| Researcher Affiliation | Collaboration | 1Beijing Institute of Technology 2Institute of Computing Technology, Chinese Academy of Science, Beijing, China 3Natural Language Computing, Microsoft Research Asia, Beijing, China 4University of California, Los Angeles |
| Pseudocode | Yes | Algorithm 1 Concept Expansion |
| Open Source Code | No | The paper provides a link for 'real showcases of retrieval' in the case study section, which are visualizations/results, not the general source code for the methodology. 'please check out this link: https://goo.gl/izcSN9.' |
| Open Datasets | Yes | Visual Genome [Krishna et al., 2017], MSCOCO [Lin et al., 2014], Flickr30K [Young et al., 2014] |
| Dataset Splits | Yes | MSCOCO: We follow [Karpathy and Fei-Fei, 2015] to prepare the training, validation and test dataset by splitting all images to 113,287 (for training), 5,000 (for validation) and 5,000 (for test). Flickr30K: We followed the split in [Karpathy and Fei-Fei, 2015] and [Faghri et al., 2017] that used 1,000 images for testing and 1,000 images for validation and the rest of them (28,783 images) for training. |
| Hardware Specification | No | The paper mentions models like LSTM and VGG19, but does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using LSTM, VGG19, ImageNet, and Adam Optimizer, but does not provide specific version numbers for any software libraries or frameworks (e.g., PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | The dimension of concept-enhanced image representation and text representation is e = 512. We used λ1 = 5.0, λ2 = 1.0, λ3 = 1.5 and λ4 = 0.05 as the hyper-parameters of loss function. An Adam Optimizer was adopted to optimize model s parameters. |