Dual Adversarial Networks for Zero-shot Cross-media Retrieval

Authors: Jingze Chi, Yuxin Peng

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on three widely-used cross-media retrieval datasets show the effectiveness of our approach.
Researcher Affiliation Academia Institute of Computer Science and Technology, Peking University, Beijing, China
Pseudocode No The paper describes the training procedure and model architecture in text and equations, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper states 'We adopt Tensor Flow 1 to implement our model' and cites 'www.tensorflow.org', but does not explicitly state that the source code for their DANZCR method is open-source or provide a link to it.
Open Datasets Yes Wikipedia dataset [Rasiwasia et al., 2010] is widely used for cross-media retrieval evaluation. Pascal Sentence dataset [Farhadi et al., 2010] is selected from 2008 PASCAL development kit. NUS-WIDE dataset [Chua et al., 2009] consists of about 270,000 images with their tags categorized into 81 categories.
Dataset Splits No The paper explicitly describes training and testing sets for all datasets (e.g., '2,173 pairs are selected as training set and 693 pairs are selected as testing set') and a further division into 'seen category set' and 'unseen category set', but does not mention a 'validation set' or 'development set'.
Hardware Specification No The paper mentions using TensorFlow for implementation but does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper states 'We adopt Tensor Flow 1 to implement our model' but does not provide version numbers for other key software components or libraries like Word2Vec, VGGNet, or Doc2Vec.
Experiment Setup Yes We adopt Tensor Flow 1 to implement our model with a base learning rate 1 4 and dropout probability 0.9. The parameters λF and λR are set to 1 2. ... The forward generative models with three fully-connected layers are adopted for both image and text to generate common representations with each layer following a Re LU layer and a dropout layer except the last. The number of hidden units are 4,096, 4,096 and 300. The reverse generative models of both image and text are composed of three fully-connected layers to reconstruct image and text representations, with the 4,096 hidden units for the first two layers. The forward and reverse discriminative model have similar structure of three fully-connected layers with 4,096, 2,048 and 1 hidden units...