Collective Deep Quantization for Efficient Cross-Modal Retrieval
Authors: Yue Cao, Mingsheng Long, Jianmin Wang, Shichen Liu
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that CDQ yields state of the art cross-modal retrieval results on standard benchmarks. |
| Researcher Affiliation | Academia | Yue Cao, Mingsheng Long, Jianmin Wang, Shichen Liu KLiss, MOE; TNList; School of Software, Tsinghua University, Beijing, China {caoyue10,liushichen95}@gmail.com {mingsheng,jimwang}@tsinghua.edu.cn |
| Pseudocode | No | The paper describes algorithms but does not include a figure, block, or section labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | The paper does not provide an explicit statement or link to its open-source code. |
| Open Datasets | Yes | NUS-WIDE (Chua et al. 2009) is a public web image dataset. MIRFlickr (Huiskes and Lew 2008) consists of 25,000 images collected from the Flickr website. |
| Dataset Splits | Yes | In NUS-WIDE, we randomly select 100 pairs per class as the query set, 500 pairs per class as the training set and 50 pairs per class as the validation set. In MIR-Flickr, we randomly select 1000 pairs as the query set, 4000 pairs as the training set and 1000 pairs as the validation set. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions 'Tensor Flow' but does not provide specific version numbers for it or any other software dependencies. |
| Experiment Setup | Yes | We use mini-batch SGD with 0.9 momentum, fix mini-batch size as 64, and cross-validate the learning rate. We follow similar strategy in (Long et al. 2016): (1) set the dimension of bottleneck layer D = 128 such that the composite quantizer can quantize the bottleneck representations accurately; (2) set K = 256 codewords for each codebook; (3) for each data point, the binary code of all M subspaces requires B = M log2 K = 8M bits (i.e. M bytes) for compact coding, where we set M = B/8 as B is known. |