ESPT: A Self-Supervised Episodic Spatial Pretext Task for Improving Few-Shot Learning

Authors: Yi Rong, Xiongbo Lu, Zhaoyang Sun, Yaxiong Chen, Shengwu Xiong

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments indicate that our ESPT method achieves new state-of-the-art performance for few-shot image classification on three mainstay benchmark datasets.
Researcher Affiliation Academia 1School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Wuhan 430070, China 2Sanya Science and Education Innovation Park, Wuhan University of Technology, Sanya 572000, China 3Hainan Yazhou Bay Seed Laboratory, Sanya 572025, China 4Shanghai Artificial Intelligence Laboratory, Shanghai 200240, China
Pseudocode Yes Algorithm 1: Training process of our ESPT method
Open Source Code Yes The source code will be available at: https://github.com/Whut-Yi Rong/ESPT.
Open Datasets Yes We verify the effectiveness of our ESPT method on three widely-used datasets for few-shot image classification, including mini Image Net (Vinyals et al. 2016), tiered Image Net (Ren et al. 2018) and CUB-200-2011 (Wah et al. 2011).
Dataset Splits Yes Using the class split in (Ravi and Larochelle 2017), we take 64, 16 and 20 classes to construct the training set, validation set and testing set, respectively.
Hardware Specification Yes Intel(R) Xeon(R) Gold 5117 @2.00GHz CPU, NVIDIA A100 Tensor Core GPU, and Ubuntu 18.04.6 LTS operation system.
Software Dependencies No The paper mentions 'The Pytorch framework' for implementation but does not specify a version number. It also mentions 'Ubuntu 18.04.6 LTS operation system' which is an OS, not a specific software dependency with a version.
Experiment Setup Yes For mini Image Net dataset, we first pre-train our models for 350 epochs with an initial learning rate of 0.1 and a minibatch size of 128. The learning rate is decayed by multiplying 0.1 after 200 and 300 epochs. In the subsequent episodic learning phase, the models are fine-tuned for 400 epochs with each epoch containing 100 few-shot episodes. We initialize the learning rate as 0.001 and cut it by a factor of 10 at 200 and 300 epochs. The same stochastic gradient descent (SGD) optimizer with Nesterov momentum of 0.9 and weight decay of 5e-4 is utilized for model training on the three datasets.