Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Contextual Similarity Aggregation with Self-attention for Visual Re-ranking

Authors: Jianbo Ouyang, Hui Wu, Min Wang, Wengang Zhou, Houqiang Li

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
Researcher Affiliation Academia 1CAS Key Laboratory of Technology in GIPAS, EEIS Department University of Science and Technology of China 2Institute of Arti๏ฌcial Intelligence, Hefei Comprehensive National Science Center EMAIL EMAIL, EMAIL
Pseudocode No The paper describes its methods but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code will be released at https://github.com/MCC-WH/CSA.
Open Datasets Yes Four image retrieval benchmark datasets, named Revisited Oxford5k (ROxf), Revisted Paris6k (RPar), ROxf + R1M, and RPar + R1M, are used to evaluate our method. The ROxf [33] and RPar [33] datasets are the revisited version of the original Oxford5k [30] and Paris6k datasets [31]. r Sf M120k [34] is used to create training samples.
Dataset Splits No The paper uses `r Sf M120k` for training samples but does not specify a train/validation/test split for this dataset used during model training. It mentions evaluation on benchmark datasets, which are typically used as test sets.
Hardware Specification Yes The computation is performed on a single 2080Ti GPU. ... The model is trained for 100 epochs on four 2080Ti GPUs.
Software Dependencies No The paper mentions optimizers (SGD) and neural network components (transformer encoder, MHA, FFNs, GELU, MSE loss) but does not list specific software library names with version numbers.
Experiment Setup Yes Our model consists of a stack of 2 transformer encoder layers, each with 12 heads of 64 dimensions. ... SGD is used to optimize the model, with an initial learning rate of 0.1, a weight decay of 10 5, and a momentum of 0.9. ... The temperature in Eq. (6) is set as 2.0. The batch size is set to 256. The model is trained for 100 epochs...