Deep Graph-neighbor Coherence Preserving Network for Unsupervised Cross-modal Hashing

Authors: Jun Yu, Hao Zhou, Yibing Zhan, Dacheng Tao4626-4634

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on three public UCMH datasets. The experimental results demonstrate the superiority of DGCPN, e.g., by improving the mean average precision from 0.722 to 0.751 on MIRFlickr-25K using 64-bit hashing codes to retrieve texts from images.We conducted extensive experiments on three public datasets. The improved performance compared with the state-of-the-art demonstrates the competitiveness of DGCPN.
Researcher Affiliation Academia Jun Yu 1, Hao Zhou1, Yibing Zhan1, Dacheng Tao2 1 Hang Zhou Dianzi University 2 The University of Sydney
Pseudocode Yes Algorithm 1 Graph-neighbor Coherence Preserving Input: M Training Images and Texts; Validation Images and Texts; Batch size N; hash code length db; Max training epoch E; trade-off parameters α, γ, λ1, and λ2; k-nearest number and scale parameter β; Output: Hashing Function Img Net(., θI) and Txt Net(., θT ); 1: Initial θI and θT ; 2: Extract image and text features of Training set and obtain graph-neighbor coherence of all training data; 3: for each i [1, E] do 4: for each j [1, M/N] do 5: obtain training data with batch size of N and the corresponding Sgc(HI, HT ); 6: update θI and θT using L(HI, HT ); 7: update θI using L(HI, BT ); 8: update θT using L(BI, HT ); 9: end for 10: calculate MAP of Validation set; if convergence, stop; 11: end for 12: return Img Net(., θI) and Txt Net(., θT );
Open Source Code Yes We will release the source code package and the trained model on https://github.com/Atmegal/DGCPN.
Open Datasets Yes Three public datasets are adopted in our experiments: Wikipedia (Rasiwasia et al. 2010), MIRFlickr-25K (Huiskes and Lew 2008), and NUS-WIDE (Chua et al. 2009).
Dataset Splits Yes Wikipedia dataset consists of 2,866 image-text pairs from 10 categories. We split Wikipedia dataset into the retrieval/test query/validation query set with 2173/462/231 image-text pairs. The whole retrieval set is used for training. MIRFlickr-25K dataset contains 20,015 image-tag pairs with multi labels from 24 classes. We split MIRFlickr-25K dataset into the retrieval/test query/validation query set with 16,015/2000/2000 image-tag pairs. 5000 image-tag pairs of the retrieval set are used for training. NUS-WIDE dataset provides 186,577 image-tag pairs with the top-10 concepts. We split NUS-WIDE into retrieval/test query/validation query set with 182,577/2000/2000 image-tag pairs. 5000 image-tag pairs of the retrieval set are used for training. Besides, we select 10,000 image-text pairs from the whole retrieval set as the validation retrieval set for computational efficiency.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions tools like VGG-19 and LDA, but does not provide specific version numbers for software dependencies or libraries (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes The layers of similarity-preserving subnetworks are set as d I-4096-db for images and d T -4096-db for texts. We use a mini-batch SGD optimizer with a 0.9 momentum and 0.0005 weight decay. The mini-batch size is set to 32. The learning rate is set to 0.005. For simplicity, we perform a 3-step grid search for the parameters. First, we decide the parameter of the pairwise distance... The final parameters are: for Wikipedia: α=0.3, γ=0.3, λ1=1, λ2=1, β=900, and k=600; for MIRFlickr-25K α=0.01, γ=0.3, λ1=1, λ2=1, β=4000, k=2000; and for NUS-WIDE: α=0.1, γ=0.3, λ1=1, λ2=1, β=4500, k=2000.