Mining on Heterogeneous Manifolds for Zero-Shot Cross-Modal Image Retrieval
Authors: Fan Yang, Zheng Wang, Jing Xiao, Shin'ichi Satoh12589-12596
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | we validate our method on visible v.s. thermal datasets and achieves significant performance improvement. |
| Researcher Affiliation | Academia | 1The University of Tokyo, Japan 2National Institute of Informatics, Japan |
| Pseudocode | No | The paper describes its methods through text and mathematical equations, but it does not include a formal pseudocode or algorithm block. |
| Open Source Code | Yes | The code of this paper: https://github.com/fyang93/cross-modal-retrieval |
| Open Datasets | Yes | MNIST dataset (Le Cun et al. 1998) (...) SVHN (Street View House Numbers) dataset (Netzer et al. 2011) (...) Reg DB (Nguyen et al. 2017) (...) SYSU-MM01 (Wu et al. 2017) |
| Dataset Splits | Yes | Reg DB (...) the entire dataset was divided into a training set and a testing set. (...) SYSU-MM01 (...) The training set contains 22,258 visible images and 11,909 thermal images of 395 persons. The testing set contains 3,803 thermal query images where 96 persons appeared, and 301 visible images randomly sampled for each person as the gallery set. (...) In the MNIST dataset, the numbers of images in the query and gallery set are 3,011 and 18,065 respectively. While the SVHN dataset has 15,299 images in the gallery and 5,274 query images. (...) we consistently set β = 0.2 and the margin of triplet loss to 0.6 through validation. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for experiments. |
| Software Dependencies | No | The paper mentions using ResNet18 and ResNet50 as backbones but does not provide specific version numbers for software libraries, frameworks, or programming languages used (e.g., PyTorch, TensorFlow, Python version). |
| Experiment Setup | Yes | Res Net18 serves as the backbone CNN for MNIST and SVHN datasets, while Res Net50 is used for Reg DB and SYSU-MM01 datasets (...) The dimension of the output of FC1 is set to 512 for all datasets. (...) The overall loss function for the cross-modal model is L = Lclass + βLtri, (9) where β is a weight on the triplet loss. In our experiments, we consistently set β = 0.2 and the margin of triplet loss to 0.6 through validation. |