InsCLR: Improving Instance Retrieval with Self-Supervision
Authors: Zelu Deng, Yujie Zhong, Sheng Guo, Weilin Huang516-524
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Ins CLR achieves similar or even better performance than the state-of-the-art SSL methods on instance retrieval. ... We conduct ablation study on two public benchmarks: Oxford and Paris with revisited annotations (Radenovi c et al. 2018), denoted by ROxf and RPar, respectively. |
| Researcher Affiliation | Industry | Zelu Deng1*, Yujie Zhong2*, Sheng Guo3, Weilin Huang4 1 Dmall 2 Meituan Inc. 3 MY Bank, Ant Group 4 Alibaba Group zelu.deng@dmall.com, jaszhong@hotmail.com, {guosheng.guosheng, weilin.hwl}@alibaba-inc.com |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. Methods are described in text and diagrams. |
| Open Source Code | Yes | Code is available at https://github.com/zeludeng/insclr. |
| Open Datasets | Yes | The training data is a subset of GLDv2 (Ozaki and Yokoo 2019). The dataset contains 1.2M images from 27k landmarks. ... We conduct ablation study on two public benchmarks: Oxford and Paris with revisited annotations (Radenovi c et al. 2018), denoted by ROxf and RPar, respectively. ... To showcase the generalization of Ins CLR, we fine-tune an Image Net-pretrained Res Net-50 with Ge M (p = 3) on another instance retrieval benchmark: INSTRE (Wang and Jiang 2015). |
| Dataset Splits | Yes | Table 4: Retrieval task on GLDv2 (% m AP@100). Method Labels Val set Test set (Weyand et al. 2020) Yes 23.30 25.57 Image Net pretrained No 0.89 0.52 Ins CLR No 13.39 13.71 |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments (e.g., specific GPU/CPU models, memory details). |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | Network architecture. To make a fair comparison, we adopt s simple network architecture to produce image-level features. As shown in Figure 2 (top-middle), it consists of three components: a backbone network, a spatial pooling layer and an embedding module. ... Training details. The training data is a subset of GLDv2 (Ozaki and Yokoo 2019). The dataset contains 1.2M images from 27k landmarks. Unless specified, the size of the offline-computed candidate pool P is set to be 500 for every image, and Nb is set to be 3 for all networks. ... In the rest of the experiments, a threshold of Tb = 0.65 with the unaugmented similarity is adopted, with Nb = 3. ... In the rest of the experiments, avg and topk are adopted with 4 iterations in the mining. |