Step-Wise Hierarchical Alignment Network for Image-Text Matching

Authors: Zhong Ji, Kexin Chen, Haoran Wang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results on two benchmark datasets demonstrate the superiority of our proposed method. We conduct experiments on two public datasets: e.g. Flickr30k and MS-COCO, and the quantitative experimental results validate that our model can achieve stateof-the-art performance on both datasets.
Researcher Affiliation Academia School of Electrical and Information Engineering, Tianjin University, Tianjin, China {jizhong, kxchen, haoranwang}@tju.edu.cn
Pseudocode No No structured pseudocode or algorithm blocks were found.
Open Source Code No The paper does not provide explicit statements or links indicating the release of source code for the described methodology.
Open Datasets Yes Two benchmark datasets are used in our experiments to testify the performance of our method: (1) Flickr30k contains 31783 images and each image is annotated with 5 sentences. Following [Karpathy and Fei-Fei, 2015], we split the dataset into 1000 test images, 1000 validation images and 29000 training images. (2) MS-COCO is another large-scale image captioning dataset with 123287 images and each image is relative with 5 descriptions. We follow [Lee et al., 2018] to split the dataset into 5000 images for validation, 5000 images for testing and the rest 113287 images for training.
Dataset Splits Yes Following [Karpathy and Fei-Fei, 2015], we split the dataset into 1000 test images, 1000 validation images and 29000 training images. We follow [Lee et al., 2018] to split the dataset into 5000 images for validation, 5000 images for testing and the rest 113287 images for training.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for experiments were mentioned in the paper.
Software Dependencies No The paper mentions software like 'Adam optimizer' and 'Glove' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We train our model with Adam optimizer for 30 epochs on each dataset. The dimension of joint embedding space for image regions and textual words are set to 1024, and the dimension of word embeddings is set to be 300, other parameters are empirically set as follows: µ1= 0.3, µ2= 0.5, λ= 15 and m = 0.2.