InsCLR: Improving Instance Retrieval with Self-Supervision

Authors: Zelu Deng, Yujie Zhong, Sheng Guo, Weilin Huang516-524

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Ins CLR achieves similar or even better performance than the state-of-the-art SSL methods on instance retrieval. ... We conduct ablation study on two public benchmarks: Oxford and Paris with revisited annotations (Radenovi c et al. 2018), denoted by ROxf and RPar, respectively.
Researcher Affiliation Industry Zelu Deng1*, Yujie Zhong2*, Sheng Guo3, Weilin Huang4 1 Dmall 2 Meituan Inc. 3 MY Bank, Ant Group 4 Alibaba Group zelu.deng@dmall.com, jaszhong@hotmail.com, {guosheng.guosheng, weilin.hwl}@alibaba-inc.com
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. Methods are described in text and diagrams.
Open Source Code Yes Code is available at https://github.com/zeludeng/insclr.
Open Datasets Yes The training data is a subset of GLDv2 (Ozaki and Yokoo 2019). The dataset contains 1.2M images from 27k landmarks. ... We conduct ablation study on two public benchmarks: Oxford and Paris with revisited annotations (Radenovi c et al. 2018), denoted by ROxf and RPar, respectively. ... To showcase the generalization of Ins CLR, we fine-tune an Image Net-pretrained Res Net-50 with Ge M (p = 3) on another instance retrieval benchmark: INSTRE (Wang and Jiang 2015).
Dataset Splits Yes Table 4: Retrieval task on GLDv2 (% m AP@100). Method Labels Val set Test set (Weyand et al. 2020) Yes 23.30 25.57 Image Net pretrained No 0.89 0.52 Ins CLR No 13.39 13.71
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments (e.g., specific GPU/CPU models, memory details).
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes Network architecture. To make a fair comparison, we adopt s simple network architecture to produce image-level features. As shown in Figure 2 (top-middle), it consists of three components: a backbone network, a spatial pooling layer and an embedding module. ... Training details. The training data is a subset of GLDv2 (Ozaki and Yokoo 2019). The dataset contains 1.2M images from 27k landmarks. Unless specified, the size of the offline-computed candidate pool P is set to be 500 for every image, and Nb is set to be 3 for all networks. ... In the rest of the experiments, a threshold of Tb = 0.65 with the unaugmented similarity is adopted, with Nb = 3. ... In the rest of the experiments, avg and topk are adopted with 4 iterations in the mining.