Contrastive Transformer Masked Image Hashing for Degraded Image Retrieval
Authors: Xiaobo Shen, Haoyu Cai, Xiuwen Gong, Yuhui Zheng
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical studies conducted on three benchmark datasets demonstrate the superiority of the proposed CTMIH over the state-of-the-art in both degraded and normal image retrieval. |
| Researcher Affiliation | Academia | 1Nanjing University of Science and Technology 2University of Technology Sydney 3Qinghai Normal University |
| Pseudocode | Yes | Algorithm 1 Image Transformation T Input: Image X, hyper-parameter δ; Output: Transformed Image. 1: Crop X with size of 256 256 randomly; 2: Resize X to size of 224 224; 3: Flip X horizontally with a probability of 0.5δ; 4: Add colorjitter to X with a probability of 0.8δ; 5: Convert X to grayscale image with a probability of 0.4δ; 6: Apply Gaussian blur to X with a probability of 0.5δ. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | MSCOCO [Lin et al., 2014] is a large-scale image dataset for object detection, segmentation, and captioning. NUS-WIDE [Chua et al., 2009] is a multi-label dataset. Image Net [Russakovsky et al., 2014] is a single-label image dataset |
| Dataset Splits | Yes | MSCOCO [Lin et al., 2014]... the 5,000 images are randomly selected as the query set and the remaining images are used as the database. The 10,000 images are randomly selected from the database for training. NUS-WIDE [Chua et al., 2009]... The 100 images are randomly sampled from each category as the query set and the remaining images are used as the database. The 500 images for each category are randomly sampled from the database for training. Image Net [Russakovsky et al., 2014]... 100 images from each category are randomly sampled for training, 5,000 images are sampled as the query set, and the remaining images are used as the database. |
| Hardware Specification | No | The paper states: 'The standard Vi T-Base is used as the backbone', but does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' but does not provide specific version numbers for any software libraries, programming languages (e.g., Python, PyTorch), or other dependencies. |
| Experiment Setup | Yes | For the proposed method, we apply Algorithm 1 on each image in the training set to generate two augmented views, where δu and δv are set to 0.5 and 1 respectively. ...The masking ratio r is set to 0.3, class probability ϱ+ is set to 0.05, and temperature τ is set to 0.5. The two hyper-parameters α and β are set to 0.1 and 0.1 respectively. The batch size is set to 32, the number of epochs is set to 100, and the learning rates of Vi T and hash layer are set to 10 5 and 10 3 respectively. The proposed method is trained using Adam optimizer. |