HAL: Improved Text-Image Matching by Mitigating Visual Semantic Hubs

Authors: Fangyu Liu, Rongtian Ye, Xun Wang, Shuaipeng Li11563-11571

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experiment our method with various configurations of model architectures and datasets. The method exhibits exceptionally good robustness and brings consistent improvement on the task of text-image matching across all settings.
Researcher Affiliation Collaboration Fangyu Liu,1 Rongtian Ye,2 Xun Wang,3 Shuaipeng Li4 1University of Cambridge, Cambridge, UK 2Aalto University, Espoo, Finland 3Malong Technologies, Shenzhen, China 4Sense Time Research, Beijing, China
Pseudocode No The paper describes the loss functions mathematically and conceptually but does not provide a pseudocode block or algorithm.
Open Source Code Yes Our code is released at: https://github.com/hardyqr/HAL.
Open Datasets Yes We use MS-COCO (Lin et al. 2014) and Flickr30k (Young et al. 2014) as our experimental datasets.
Dataset Splits Yes For MSCOCO... 113,287 images for training, 5,000 for validation and 5,000 for testing. Flickr30k has 30,000 images for training; 1,000 for validation; 1,000 for testing.
Hardware Specification Yes We do not include HAL+MB for (Vendrov et al. 2016) as it demands GPU memory exceeding 11GB, which is the limit of our used GTX 2080Ti.
Software Dependencies No The paper mentions software components like GRU, ResNet152, Inception-ResNet-v2, and VGG19, but does not specify their version numbers or the versions of any underlying programming languages or libraries (e.g., PyTorch, TensorFlow, Python).
Experiment Setup Yes For more details about hyperparameters and training configurations please refer to Table 3 and code release: https://github.com/hardyqr/HAL. Table 3 lists specific hyperparameters such as 'margin=0.2, lr=0.001, lr update=10, bs=128, epoch=30', 'γ=30, ϵ=0.3', 'α=40, β=40, ϵ1=0.2, ϵ2=0.1'.