Contextual Similarity Aggregation with Self-attention for Visual Re-ranking
Authors: Jianbo Ouyang, Hui Wu, Min Wang, Wengang Zhou, Houqiang Li
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method. |
| Researcher Affiliation | Academia | 1CAS Key Laboratory of Technology in GIPAS, EEIS Department University of Science and Technology of China 2Institute of Artiļ¬cial Intelligence, Hefei Comprehensive National Science Center {ouyjb,wh241300}@mail.ustc.edu.cn wangmin@iai.ustc.edu.cn, {zhwg,lihq}@ustc.edu.cn |
| Pseudocode | No | The paper describes its methods but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code will be released at https://github.com/MCC-WH/CSA. |
| Open Datasets | Yes | Four image retrieval benchmark datasets, named Revisited Oxford5k (ROxf), Revisted Paris6k (RPar), ROxf + R1M, and RPar + R1M, are used to evaluate our method. The ROxf [33] and RPar [33] datasets are the revisited version of the original Oxford5k [30] and Paris6k datasets [31]. r Sf M120k [34] is used to create training samples. |
| Dataset Splits | No | The paper uses `r Sf M120k` for training samples but does not specify a train/validation/test split for this dataset used during model training. It mentions evaluation on benchmark datasets, which are typically used as test sets. |
| Hardware Specification | Yes | The computation is performed on a single 2080Ti GPU. ... The model is trained for 100 epochs on four 2080Ti GPUs. |
| Software Dependencies | No | The paper mentions optimizers (SGD) and neural network components (transformer encoder, MHA, FFNs, GELU, MSE loss) but does not list specific software library names with version numbers. |
| Experiment Setup | Yes | Our model consists of a stack of 2 transformer encoder layers, each with 12 heads of 64 dimensions. ... SGD is used to optimize the model, with an initial learning rate of 0.1, a weight decay of 10 5, and a momentum of 0.9. ... The temperature in Eq. (6) is set as 2.0. The batch size is set to 256. The model is trained for 100 epochs... |