Binary Generative Adversarial Networks for Image Retrieval
Authors: Jingkuan Song, Tao He, Lianli Gao, Xing Xu, Alan Hanjalic, Heng Tao Shen
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on standard datasets (CIFAR-10, NUSWIDE, and Flickr) demonstrate that our BGAN significantly outperforms existing hashing methods by up to 107% in terms of m AP (See Table 2)1. We evaluate our BGAN on the task of large-scale image retrieval. Specifically, the experiments are designed to study the following research questions of our algorithm: RQ1: How does each component of our algorithm affect the performance? RQ2: Do the binary codes computed directly without relaxation improve the performance of the relaxed resolution? RQ3: Does the performance of BGAN significantly outperform the state-of-the-art hashing algorithms? RQ4: What is the efficiency of BGAN? |
| Researcher Affiliation | Academia | Jingkuan Song,1 Tao He,1 Lianli Gao,1 Xing Xu,1 Alan Hanjalic,2 Heng Tao Shen1 1Center for Future Media and School of Computer Science and Engineering, University of Electronic Science and Technology of China. 2Delft University of Technology, Netherlands. {jingkuan.song, hetaoconquer}@gmail.com, {lianli.gao,xing.xu}@uestc.edu.cn, a.hanjalic@tudelft.nl, shenhengtao@hotmail.com |
| Pseudocode | No | The paper describes the system architecture and training process, but it does not include a clearly labeled "Pseudocode" or "Algorithm" block. |
| Open Source Code | Yes | 1Our code: https://github.com/htconquer/BGAN |
| Open Datasets | Yes | We conduct empirical evaluation on three public benchmark datasets, CIFAR-10, NUS-WIDE, and Flickr. CIFAR-10 labeled subsets of the 80 million tiny images dataset, which consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. NUS-WIDE is a web image dataset containing 269,648 images downloaded from Flickr. Tagging ground-truth for 81 semantic concepts is provided for evaluation. We follow the settings in (Zhu et al. 2016) and use the subset of 195,834 images from the 21 most frequent concepts, where each concept consists of at least 5,000 images. Flickr is a collection of about 25,000 images from Flickr, where each image is labeled with one of the 38 concepts. |
| Dataset Splits | No | In NUS-WIDE and CIFAR-10, we randomly select 100 images per class as the test query set, and 1,000 images per class as the training set. In Flickr, we randomly select 1,000 images as the test query set, and 4,000 images for training. The paper specifies training and test sets but does not explicitly mention a separate validation set. |
| Hardware Specification | No | The paper mentions training and testing times but does not provide specific hardware details such as CPU/GPU models or memory specifications. |
| Software Dependencies | No | The paper does not list specific versions for software dependencies, libraries, or frameworks used in the experiments (e.g., Python version, deep learning framework version). |
| Experiment Setup | Yes | By default, we set λ1 = 0.1 and λ2 = 0.1. We set the mini-batch size as 256, and the learning rate as 0.01. For the hashing layer, we start training BGAN with βt = 1. For each stage t, after BGAN converges, we increase βt and train (i.e., fine-tune) BGAN by setting the converged network parameters as the initialization for training the BGAN in the next stage. For βt towards , the network will converge to BGAN with sgn(z) as activation function, which can generate the desired binary codes. Using βt = 10 we can already achieve fast convergence for training BGAN. |