Boosting Few-Shot Learning via Attentive Feature Regularization

Authors: Xingyu Zhu, Shuo Wang, Jinda Lu, Yanbin Hao, Haifeng Liu, Xiangnan He

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical studies on several popular FSL benchmarks demonstrate the effectiveness of AFR, which improves the recognition accuracy of novel categories without the need to retrain any feature extractor, especially in the 1-shot setting.
Researcher Affiliation Collaboration Xingyu Zhu1, 2, Shuo Wang1, 2*, Jinda Lu1, 2, Yanbin Hao1, 2, Haifeng Liu3, Xiangnan He1, 2 1Department of Electronic Engineering and Information Science, University of Science and Technology of China; 2Mo E Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China; 3Brain-Inspired Technology Co., Ltd.
Pseudocode No The paper describes its method in detailed text and mathematical equations, and uses diagrams (e.g., Figure 2, Figure 3) to illustrate components and calculations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about making its source code available or a link to a code repository for the described methodology.
Open Datasets Yes We evaluate our method on three benchmark datasets, i.e., Mini-Image Net (Vinyals et al. 2016), Tiered Image Net (Ren et al. 2018), and Meta-Datase (Triantafillou et al. 2019).
Dataset Splits Yes Mini-Image Net consists of 100 categories and each category has 600 images. It is divided into three parts: 64 base categories for training, 16 novel categories for validation, and the remaining 20 categories for testing. Similar to Mini-Image Net, Tiered-Image Net consists of 779165 images from 608 categories, where 351 base categories are used for training, 97 novel categories are used for validation, and the remaining 160 novel categories are used for testing. In each task, N novel categories are randomly sampled at first, then K samples in each of the N categories are sampled for training, and finally, 15 samples (different from the previous K samples) in each of the N categories are sampled for testing.
Hardware Specification No The paper mentions using a 'Res Net-12' backbone, but does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for conducting the experiments.
Software Dependencies No The paper mentions "Adam optimization (Kingma and Ba 2015)" but does not specify any software dependencies with version numbers for libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used for the implementation.
Experiment Setup Yes These features are used to train the classifier Γ using the loss function L defined in Eq. (11) for a total of 1000 epochs. We employ Adam optimization (Kingma and Ba 2015) with a learning rate of 0.001 and a weight decay of 0.0001 during the training process. To balance the optimization process of these losses, we set µ1 = 5 and µ2 = 20 experientially and following (Li et al. 2022).