Coarse2Fine: Local Consistency Aware Re-prediction for Weakly Supervised Object Localization
Authors: Yixuan Pan, Yao Yao, Yichao Cao, Chongjin Chen, Xiaobo Lu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that our LCAR outperforms the state-of-the-art on both the CUB-200-2011 and ILSVRC datasets, achieving 95.9% and 70.7% of GTKnow localization accuracy, respectively. |
| Researcher Affiliation | Academia | 1School of Automation, Southeast University, Nanjing, China. 2Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Nanjing, China. 3Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences Shanghai, China. 4University of Chinese Academy of Sciences, Beijing, China. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper. |
| Open Datasets | Yes | LCAR is evaluated on two wildly used challenging WSOL benchmarks, including CUB-200-2011(Wah et al. 2011) and ILSVRC(Russakovsky et al. 2015). |
| Dataset Splits | No | CUB-200-2011 is a fine-grained dataset containing 200 different bird species, consisting of 5,994 training images and 5,794 test images. For ILSVRC, we chose the subset which contains 1.2 million training images and 50,000 test images for the WSOL task. The paper does not explicitly mention a validation split. |
| Hardware Specification | No | The paper states 'We thank the Big Data Center of Southeast University for providing facility support for the numerical calculations in this paper,' but does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions models like ViT-S/16 and DINO, but does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Implementation Details. We adopt the Vi T-S/16 model (Dosovitskiy et al. 2020) pretrained by DINO(Caron et al. 2021) as the backbone. ... For the self-distillation loss, we set α to 0.3 and the temperature factor T to 3. For testing, we fix the background threshold β for generating the localization seed region to 0.6; In ARM, the position term factor wp is set to 0.01, the iterative update speed factor η is set to 0.1, and the number of iterations is fixed to 50; In SGRM, the threshold values γl1, γh1, γl2, γh2 are set to 0.1, 0.6, 0.1, 0.4 respectively (γh2 is set to 0.6 in ILSVRC), the loss function weights w1,w2 and w3 are fixed as 2,0.1,0.1 separately. We use SGD to optimize Aggregation Net, the momentum and learning rate are set to 0.9 and 0.05 respectively. |