Dual Adversarial Graph Neural Networks for Multi-label Cross-modal Retrieval

Authors: Shengsheng Qian, Dizhan Xue, Huaiwen Zhang, Quan Fang, Changsheng Xu2440-2448

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments conducted on two cross-modal retrieval benchmark datasets, NUS-WIDE and MIRFlickr, indicate the superiority of DAGNN.
Researcher Affiliation Academia 1National Lab of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3Peng Cheng Laboratory
Pseudocode No The paper does not contain a clearly labeled pseudocode block or algorithm.
Open Source Code No The paper does not provide any explicit statement or link for open-source code for the described methodology.
Open Datasets Yes NUS-WIDE: we randomly pick up 2,000 image-text pairs as the testing set and the rest as the training set. MIRFlickr: 2,000 image-text pairs are randomly selected as the testing set and the rest are used for training.
Dataset Splits No The paper specifies training and testing sets, but does not explicitly mention a separate validation set or its size/proportion for hyperparameter tuning. It states "we validate the hyper-parameters α and β" but does not link this to a specific validation split.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU model, CPU type) used for running the experiments.
Software Dependencies No The paper mentions "implemented on Pytorch deep learning framework" but does not provide a specific version number for PyTorch or any other software dependencies.
Experiment Setup Yes The batch size m is set as 1024 for NUS-WIDE and 100 for MIRFlickr. The initial learning rates of the optimizer are 0.00005 on both datasets. ... we validate the hyper-parameters α and β and finally set α = 0.2, β = 0.2 for both datasets. ... The multi-hop graph neural networks consist of five GAT layers on NUS-WIDE and four GAT layers on MIRFlickr together with one aggregation layer, in which the output dimensionality of each GAT layer and aggregation layer is 1,024.