Collective Deep Quantization for Efficient Cross-Modal Retrieval

Authors: Yue Cao, Mingsheng Long, Jianmin Wang, Shichen Liu

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that CDQ yields state of the art cross-modal retrieval results on standard benchmarks.
Researcher Affiliation Academia Yue Cao, Mingsheng Long, Jianmin Wang, Shichen Liu KLiss, MOE; TNList; School of Software, Tsinghua University, Beijing, China {caoyue10,liushichen95}@gmail.com {mingsheng,jimwang}@tsinghua.edu.cn
Pseudocode No The paper describes algorithms but does not include a figure, block, or section labeled 'Pseudocode' or 'Algorithm'.
Open Source Code No The paper does not provide an explicit statement or link to its open-source code.
Open Datasets Yes NUS-WIDE (Chua et al. 2009) is a public web image dataset. MIRFlickr (Huiskes and Lew 2008) consists of 25,000 images collected from the Flickr website.
Dataset Splits Yes In NUS-WIDE, we randomly select 100 pairs per class as the query set, 500 pairs per class as the training set and 50 pairs per class as the validation set. In MIR-Flickr, we randomly select 1000 pairs as the query set, 4000 pairs as the training set and 1000 pairs as the validation set.
Hardware Specification No The paper does not explicitly describe the specific hardware used to run its experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions 'Tensor Flow' but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup Yes We use mini-batch SGD with 0.9 momentum, fix mini-batch size as 64, and cross-validate the learning rate. We follow similar strategy in (Long et al. 2016): (1) set the dimension of bottleneck layer D = 128 such that the composite quantizer can quantize the bottleneck representations accurately; (2) set K = 256 codewords for each codebook; (3) for each data point, the binary code of all M subspaces requires B = M log2 K = 8M bits (i.e. M bytes) for compact coding, where we set M = B/8 as B is known.