Contextual Similarity Aggregation with Self-attention for Visual Re-ranking

Authors: Jianbo Ouyang, Hui Wu, Min Wang, Wengang Zhou, Houqiang Li

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
Researcher Affiliation Academia 1CAS Key Laboratory of Technology in GIPAS, EEIS Department University of Science and Technology of China 2Institute of Artificial Intelligence, Hefei Comprehensive National Science Center {ouyjb,wh241300}@mail.ustc.edu.cn wangmin@iai.ustc.edu.cn, {zhwg,lihq}@ustc.edu.cn
Pseudocode No The paper describes its methods but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code will be released at https://github.com/MCC-WH/CSA.
Open Datasets Yes Four image retrieval benchmark datasets, named Revisited Oxford5k (ROxf), Revisted Paris6k (RPar), ROxf + R1M, and RPar + R1M, are used to evaluate our method. The ROxf [33] and RPar [33] datasets are the revisited version of the original Oxford5k [30] and Paris6k datasets [31]. r Sf M120k [34] is used to create training samples.
Dataset Splits No The paper uses `r Sf M120k` for training samples but does not specify a train/validation/test split for this dataset used during model training. It mentions evaluation on benchmark datasets, which are typically used as test sets.
Hardware Specification Yes The computation is performed on a single 2080Ti GPU. ... The model is trained for 100 epochs on four 2080Ti GPUs.
Software Dependencies No The paper mentions optimizers (SGD) and neural network components (transformer encoder, MHA, FFNs, GELU, MSE loss) but does not list specific software library names with version numbers.
Experiment Setup Yes Our model consists of a stack of 2 transformer encoder layers, each with 12 heads of 64 dimensions. ... SGD is used to optimize the model, with an initial learning rate of 0.1, a weight decay of 10 5, and a momentum of 0.9. ... The temperature in Eq. (6) is set as 2.0. The batch size is set to 256. The model is trained for 100 epochs...