Few-Shot Image and Sentence Matching via Gated Visual-Semantic Embedding

Authors: Yan Huang, Yang Long, Liang Wang8489-8496

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Based on the fused metric, we perform extensive experiments in terms of few-shot and conventional image and sentence matching, and demonstrate the effectiveness of the proposed model by achieving the state-of-the-art results on two public benchmark datasets.Experimental Results To demonstrate the effectiveness of the proposed model, we perform experiments of few-shot and conventional image and sentence matching on two publicly available datasets.
Researcher Affiliation Academia Yan Huang,1,3 Yang Long,4 Liang Wang1,2,3 1Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR) 2Center for Excellence in Brain Science and Intelligence Technology (CEBSIT), Institute of Automation, Chinese Academy of Sciences (CASIA) 3University of Chinese Academy of Sciences (UCAS) 4Open Lab, School of Computing, Newcastle University
Pseudocode No The paper describes algorithmic steps but does not include structured pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not contain any statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes 1) Flickr30k (Young et al. 2014) consists of 31783 images collected from the Flickr website. Each image is accompanied with 5 human annotated sentences. 2) MSCOCO (Lin et al. 2014) consists of 82783 training and 40504 validation images, each of which is associated with 5 sentences.
Dataset Splits Yes For conventional image and sentence matching, we use the public training, validation and test splits on the two datasets. On the MSCOCO dataset, we perform 5-fold crossvalidation and report the averaged results when using 1000 images for test.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions names of models (e.g., Faster-RCNN, Skip-Gram, LSTM) and optimization techniques (e.g., stochastic gradient descent) but does not list any specific software libraries or platforms with their version numbers.
Experiment Setup Yes Other parameters are empirically set as follows: H=1024 and m=0.2. When training the three modules, we use stochastic gradient descent with a learning rate of 0.01, momentum of 0.9, weight decay of 0.0005, batch size of 128, and gradient clipping at 0.1. The VSE modules are trained for 30 epochs to guarantee its convergence. While for the gated metric fusion module, we use 100 epochs.