Weakly Supervised Deep Hyperspherical Quantization for Image Retrieval
Authors: Jinpeng Wang, Bin Chen, Qiang Zhang, Zaiqiao Meng, Shangsong Liang, Shutao Xia2755-2763
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that WSDHQ can achieve state-of-art performance on weakly-supervised compact coding. Extensive experiments show that WSDHQ yields stateof-art retrieval results in weakly-supervised scenario. We conduct extensive experiments to evaluate our proposed WSDHQ model with several state-of-art shallow and deep hashing methods on two web image datasets. The MAP results of all methods are reported in Table 1, which shows that the proposed WSDHQ model substantially outperforms all the comparison methods. |
| Researcher Affiliation | Academia | 1Tsinghua Shenzhen International Graduate School, Tsinghua University 2School of Computer Science and Engineering, Sun Yat-sen University 3University College London 4University of Cambridge |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It describes the algorithms and optimization steps in paragraph form. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It does not include a repository link or an explicit statement of code release. |
| Open Datasets | Yes | MIR-FLICKR25K (Huiskes and Lew 2008) is a dataset of 25,000 Flickr images associated with 1,386 tags. NUS-WIDE (Chua et al. 2009) is a large-scale web image dataset also collected from Flickr, which contains 269,648 images with 5,018 tags provided by users. |
| Dataset Splits | Yes | 2,000 images are randomly sampled as test queries and the rest are used as retrieval database and training images. We collect a subset of 193,752 images with the 21 most frequent labels for experiments. We follow (Cao et al. 2017; Liu et al. 2018) to randomly sample 5,000 images as queries and remain the rest as the database, from which we further sample 10,000 images and their tag sets as training data. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. It mentions using "a standard CNN f" and "Alex Net" as the backbone network, but no specific hardware details. |
| Software Dependencies | Yes | We implement WSDHQ based on Tensor Flow (Abadi et al. 2016). We take the Word2Vec (Mikolov et al. 2013) as word embedding model and represent each tag with a 300dimensional pre-trained embedding. We adopt a mini-batch Adam with default parameters as optimizer. |
| Experiment Setup | Yes | For the semantic correlation graph, we set the maximum number of neighbors k = 20 for each tag, the correlation threshold τ as 0.75 and the merging threshold ϵ as 0.1. We set the number of tags for negative tags selected in the adaptive cosine margin loss as Kn = 1000. We fine-tune all layers copied from pre-trained model and train the transform layer via back-propagation from scratch. We adopt a mini-batch Adam with default parameters as optimizer. Besides, we select learning rate from 10 5 10 2, the hyper-parameter λ from 10 5 10 1 and γ from [0.3, 0.5, 0.7, 1, 2, 3, 4] via cross-validation. Following (Cao et al. 2016, 2017; Liu et al. 2018; Eghbali and Tahvildari 2019), we adopt K = 256 codewords for each codebook, thus the binary index for each image of all M codebooks requires B = M log2 K = 8M bits (i.e., M bytes). |