reproducibilityindex.ai

Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval

Authors: Fan Yang, Zheng Wang, Jing Xiao, Shin'ichi Satoh12589-12596

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	we validate our method on visible v.s. thermal datasets and achieves signiﬁcant performance improvement.
Researcher Affiliation	Academia	1The University of Tokyo, Japan 2National Institute of Informatics, Japan
Pseudocode	No	The paper describes its methods through text and mathematical equations, but it does not include a formal pseudocode or algorithm block.
Open Source Code	Yes	The code of this paper: https://github.com/fyang93/cross-modal-retrieval
Open Datasets	Yes	MNIST dataset (Le Cun et al. 1998) (...) SVHN (Street View House Numbers) dataset (Netzer et al. 2011) (...) Reg DB (Nguyen et al. 2017) (...) SYSU-MM01 (Wu et al. 2017)
Dataset Splits	Yes	Reg DB (...) the entire dataset was divided into a training set and a testing set. (...) SYSU-MM01 (...) The training set contains 22,258 visible images and 11,909 thermal images of 395 persons. The testing set contains 3,803 thermal query images where 96 persons appeared, and 301 visible images randomly sampled for each person as the gallery set. (...) In the MNIST dataset, the numbers of images in the query and gallery set are 3,011 and 18,065 respectively. While the SVHN dataset has 15,299 images in the gallery and 5,274 query images. (...) we consistently set β = 0.2 and the margin of triplet loss to 0.6 through validation.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for experiments.
Software Dependencies	No	The paper mentions using ResNet18 and ResNet50 as backbones but does not provide specific version numbers for software libraries, frameworks, or programming languages used (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup	Yes	Res Net18 serves as the backbone CNN for MNIST and SVHN datasets, while Res Net50 is used for Reg DB and SYSU-MM01 datasets (...) The dimension of the output of FC1 is set to 512 for all datasets. (...) The overall loss function for the cross-modal model is L = Lclass + βLtri, (9) where β is a weight on the triplet loss. In our experiments, we consistently set β = 0.2 and the margin of triplet loss to 0.6 through validation.