Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Cross-modal Common Representation Learning by Hybrid Transfer Network

Authors: Xin Huang, Yuxin Peng, Mingkuan Yuan

IJCAI 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper proposes Cross-modal Hybrid Transfer Network (CHTN) with two subnetworks: Modal-sharing transfer subnetwork utilizes the modality in both source and target domains as a bridge, for transferring knowledge to both two modalities simultaneously; Layer-sharing correlation subnetwork preserves the inherent cross-modal semantic correlation to further adapt to cross-modal retrieval task. Cross-modal data can be converted to common representation by CHTN for retrieval, and comprehensive experiments on 3 datasets show its effectiveness.
Researcher Affiliation	Academia	Xin Huang, Yuxin Peng , and Mingkuan Yuan Institute of Computer Science and Technology, Peking University, Beijing 100871, China EMAIL
Pseudocode	No	The paper provides mathematical formulations for loss functions but no structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using 'Caffe Model Zoo' as a source for pre-trained models ('Alex Net pre-trained on Image Net from the Caffe1 Model Zoo'), but does not provide a link or statement about making its own source code available. The provided link 'http://caffe.berkeleyvision.org' is for the Caffe framework itself, not the paper's specific implementation.
Open Datasets	Yes	In the experiments, Image Net serves as the single-modal source domain, and we adopt a widely-used subset with over 1,200,000 labeled images [Krizhevsky et al., 2012] from ILSVRC 2012. We perform knowledge transfer from Image Net to 3 cross-modal datasets as target domains respectively, and conduct cross-modal retrieval on them, namely Wikipedia, NUS-WIDE-10k and Pascal Sentences. For fair comparison, we strictly take the same dataset partition according to [Feng et al., 2014; Peng et al., 2016] for our CHTN and all the compared methods in the experiments.
Dataset Splits	Yes	Wikipedia dataset ... The dataset is randomly split into training set with 2,173 pairs, test set with 462 pairs, and validation set with 231 pairs. NUS-WIDE-10k dataset ... The dataset is randomly split into training set with 8,000 pairs, test set with 1,000 pairs, and validation set with 1,000 pairs evenly from the 10 categories. Pascal Sentences dataset ... Pascal Sentences dataset is randomly split into training set with 800 pairs, test set with 100 pairs, and validation set with 100 pairs evenly from the 20 categories.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running experiments.
Software Dependencies	No	The paper mentions 'Caffe' and 'Alex Net' but does not specify version numbers for Caffe or any other software dependencies.
Experiment Setup	Yes	The base learning rates of all the fully-connected layers are set to be 0.01. The MMD loss layers are implemented following [Long et al., 2015]... there are two contrastive loss layers from Caffe between the two fully-connected layers... Moreover, because the magnitude of Loss Cross is much larger than those of Loss Single, Loss Source, and Loss Correlation (about 1,000 times), we set its weight as 0.001, and those of Loss Single, Loss Source, and Loss Correlation are all 1.