Attention Shifting to Pursue Optimal Representation for Adapting Multi-granularity Tasks

Authors: Gairui Bai, Wei Xi, Yihan Zhao, Xinhui Liu, Jizhong Zhao

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments validate Seg AS effectiveness in multi-granularity recognition of three tasks.
Researcher Affiliation Academia School of Computer Science and Technology, Xi an Jiaotong University, Xi an, China
Pseudocode No The paper describes the proposed method in text and with a diagram, but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing open-source code or a link to a code repository.
Open Datasets Yes We evaluated the performance of Seg AS in various experiments, including occlusion recognition, object detection, and fine-grained recognition. Specifically, we performed these experiments on Image Net-100 [Russakovsky et al., 2015], Pascal VOC [Everingham et al., 2010], Place 205 [Zhou et al., 2014], COCO [Lin et al., 2014] datasets, and CUB-200-2011 [Welinder et al., 2010].
Dataset Splits No The paper mentions fine-tuning with training sets and evaluating on testing sets, and references dataset names, but it does not provide specific percentages or counts for training, validation, or test splits, nor does it explicitly state the use of a validation set in its primary experimental setup description.
Hardware Specification Yes To verify the efficiency of our proposed method, we conducted experiments on four NVIDIA-Ge Force-RTX 3090.
Software Dependencies No The paper mentions using the 'SGD optimizer' and 'Res Net50' backbone but does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup Yes The model was trained using SGD [Robbins and Monro, 1951] optimizer with a weight decay of 1 10 4 and momentum of 0.9. The temperature parameter τ was always set to 0.2. The total epoch was set as 200. In Image Net-1k, the number of semantic levels was defined as L = 3 and (M1, M2, M3) = (30000, 10000, 1000), details are in appendix B.