Deep Graph-neighbor Coherence Preserving Network for Unsupervised Cross-modal Hashing
Authors: Jun Yu, Hao Zhou, Yibing Zhan, Dacheng Tao4626-4634
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on three public UCMH datasets. The experimental results demonstrate the superiority of DGCPN, e.g., by improving the mean average precision from 0.722 to 0.751 on MIRFlickr-25K using 64-bit hashing codes to retrieve texts from images.We conducted extensive experiments on three public datasets. The improved performance compared with the state-of-the-art demonstrates the competitiveness of DGCPN. |
| Researcher Affiliation | Academia | Jun Yu 1, Hao Zhou1, Yibing Zhan1, Dacheng Tao2 1 Hang Zhou Dianzi University 2 The University of Sydney |
| Pseudocode | Yes | Algorithm 1 Graph-neighbor Coherence Preserving Input: M Training Images and Texts; Validation Images and Texts; Batch size N; hash code length db; Max training epoch E; trade-off parameters α, γ, λ1, and λ2; k-nearest number and scale parameter β; Output: Hashing Function Img Net(., θI) and Txt Net(., θT ); 1: Initial θI and θT ; 2: Extract image and text features of Training set and obtain graph-neighbor coherence of all training data; 3: for each i [1, E] do 4: for each j [1, M/N] do 5: obtain training data with batch size of N and the corresponding Sgc(HI, HT ); 6: update θI and θT using L(HI, HT ); 7: update θI using L(HI, BT ); 8: update θT using L(BI, HT ); 9: end for 10: calculate MAP of Validation set; if convergence, stop; 11: end for 12: return Img Net(., θI) and Txt Net(., θT ); |
| Open Source Code | Yes | We will release the source code package and the trained model on https://github.com/Atmegal/DGCPN. |
| Open Datasets | Yes | Three public datasets are adopted in our experiments: Wikipedia (Rasiwasia et al. 2010), MIRFlickr-25K (Huiskes and Lew 2008), and NUS-WIDE (Chua et al. 2009). |
| Dataset Splits | Yes | Wikipedia dataset consists of 2,866 image-text pairs from 10 categories. We split Wikipedia dataset into the retrieval/test query/validation query set with 2173/462/231 image-text pairs. The whole retrieval set is used for training. MIRFlickr-25K dataset contains 20,015 image-tag pairs with multi labels from 24 classes. We split MIRFlickr-25K dataset into the retrieval/test query/validation query set with 16,015/2000/2000 image-tag pairs. 5000 image-tag pairs of the retrieval set are used for training. NUS-WIDE dataset provides 186,577 image-tag pairs with the top-10 concepts. We split NUS-WIDE into retrieval/test query/validation query set with 182,577/2000/2000 image-tag pairs. 5000 image-tag pairs of the retrieval set are used for training. Besides, we select 10,000 image-text pairs from the whole retrieval set as the validation retrieval set for computational efficiency. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions tools like VGG-19 and LDA, but does not provide specific version numbers for software dependencies or libraries (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | The layers of similarity-preserving subnetworks are set as d I-4096-db for images and d T -4096-db for texts. We use a mini-batch SGD optimizer with a 0.9 momentum and 0.0005 weight decay. The mini-batch size is set to 32. The learning rate is set to 0.005. For simplicity, we perform a 3-step grid search for the parameters. First, we decide the parameter of the pairwise distance... The final parameters are: for Wikipedia: α=0.3, γ=0.3, λ1=1, λ2=1, β=900, and k=600; for MIRFlickr-25K α=0.01, γ=0.3, λ1=1, λ2=1, β=4000, k=2000; and for NUS-WIDE: α=0.1, γ=0.3, λ1=1, λ2=1, β=4500, k=2000. |