Cross-modal Common Representation Learning by Hybrid Transfer Network

Authors: Xin Huang, Yuxin Peng, Mingkuan Yuan

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper proposes Cross-modal Hybrid Transfer Network (CHTN) with two subnetworks: Modal-sharing transfer subnetwork utilizes the modality in both source and target domains as a bridge, for transferring knowledge to both two modalities simultaneously; Layer-sharing correlation subnetwork preserves the inherent cross-modal semantic correlation to further adapt to cross-modal retrieval task. Cross-modal data can be converted to common representation by CHTN for retrieval, and comprehensive experiments on 3 datasets show its effectiveness.
Researcher Affiliation Academia Xin Huang, Yuxin Peng , and Mingkuan Yuan Institute of Computer Science and Technology, Peking University, Beijing 100871, China pengyuxin@pku.edu.cn
Pseudocode No The paper provides mathematical formulations for loss functions but no structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions using 'Caffe Model Zoo' as a source for pre-trained models ('Alex Net pre-trained on Image Net from the Caffe1 Model Zoo'), but does not provide a link or statement about making its own source code available. The provided link 'http://caffe.berkeleyvision.org' is for the Caffe framework itself, not the paper's specific implementation.
Open Datasets Yes In the experiments, Image Net serves as the single-modal source domain, and we adopt a widely-used subset with over 1,200,000 labeled images [Krizhevsky et al., 2012] from ILSVRC 2012. We perform knowledge transfer from Image Net to 3 cross-modal datasets as target domains respectively, and conduct cross-modal retrieval on them, namely Wikipedia, NUS-WIDE-10k and Pascal Sentences. For fair comparison, we strictly take the same dataset partition according to [Feng et al., 2014; Peng et al., 2016] for our CHTN and all the compared methods in the experiments.
Dataset Splits Yes Wikipedia dataset ... The dataset is randomly split into training set with 2,173 pairs, test set with 462 pairs, and validation set with 231 pairs. NUS-WIDE-10k dataset ... The dataset is randomly split into training set with 8,000 pairs, test set with 1,000 pairs, and validation set with 1,000 pairs evenly from the 10 categories. Pascal Sentences dataset ... Pascal Sentences dataset is randomly split into training set with 800 pairs, test set with 100 pairs, and validation set with 100 pairs evenly from the 20 categories.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running experiments.
Software Dependencies No The paper mentions 'Caffe' and 'Alex Net' but does not specify version numbers for Caffe or any other software dependencies.
Experiment Setup Yes The base learning rates of all the fully-connected layers are set to be 0.01. The MMD loss layers are implemented following [Long et al., 2015]... there are two contrastive loss layers from Caffe between the two fully-connected layers... Moreover, because the magnitude of Loss Cross is much larger than those of Loss Single, Loss Source, and Loss Correlation (about 1,000 times), we set its weight as 0.001, and those of Loss Single, Loss Source, and Loss Correlation are all 1.