Self-supervised Network Evolution for Few-shot Classification

Authors: Xuwen Tang, Zhu Teng, Baopeng Zhang, Jianping Fan

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments to examine our network evolution approach against numerous state-of-the-art ones, especially in a higher way setup and cross-dataset scenarios. In this section, we first introduce our experimental setting including datasets, implementation details, and evaluation criteria. Extensive experiments are conducted on three widely used benchmarks for the few-shot classification task and comparisons with a number of state-of-the-art methods are reported in Section 4.2. We execute ablation studies in Section 4.3 where the contributions of components in the SNE model are analyzed.
Researcher Affiliation Collaboration Xuwen Tang1 , Zhu Teng1 , Baopeng Zhang1 , Jianping Fan2 1Beijing Jiaotong University 2Lenovo Research {19120402, zteng, bpzhang}@bjtu.edu.cn, jfan1@lenovo.com
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes Experiments are executed on three widely used datasets for few-shot classification: mini Image Net, CIFAR-FS, and FC100. The mini Image Net dataset is a subset of Image Net, which contains 100 classes with 600 images per class randomly selected from the 1000 classes in Image Net. The CIFAR-FS dataset is constructed from the standard CIFAR100 dataset, which includes 100 classes with 600 images per class.
Dataset Splits Yes The few-shot classification dataset is divided into three parts: base set (Db), validation set (Dv), and novel set (Dn), where categories from these three sets are distinct (e.g., a category in the base set cannot be found in the novel set). The base set consists of a large number of labeled images Db = {(xi, yi), i = 1, 2, , mb} where yi 7 ybase. The novel set is composed by relatively small amount of labeled data Dn = {(xj, yj), j = 1, 2, , mn} where yj 7 ynovel. Notice that ybase 3 ynovel = 0. The validation set Dv consists of the classes different from both Db and Dn, and is employed to determine the hyperparameters. For the episode setting, we follow the N-way K-shot task. Each episode consists of n classes randomly selected from the dataset, a labeled support set (S) containing k images per class, and an unlabeled query set (Q) including q images per class. Both mini Image Net and CIFAR-FS are randomly split into 3 parts: 64 base classes, 16 validation classes, and 20 novel classes. The FC100 dataset is also built from the standard CIFAR-100 dataset with 100 classes with 600 images per class. Different from the above two datasets, the classes in FC100 are split based on the superclass. Base classes contain 12 superclasses (60 classes), validation classes incorporate 4 superclasses (20 classes), and novel classes comprise 4 superclasses (20 classes).
Hardware Specification No The paper mentions training time but does not specify any hardware details like specific GPU or CPU models, or machine specifications used for running experiments.
Software Dependencies No The paper mentions using 'Res Net-12' as backbone and 'SGD optimizer' but does not provide specific version numbers for any software dependencies or libraries (e.g., TensorFlow, PyTorch, Python versions).
Experiment Setup Yes Implementation Details: We use Res Net-12 as our backbone in all experiments. The Res Net-12 contains 4 Residual blocks, and each residual block consists of 3 convolutional layers with a 3x3 kernel followed by a Batchnorm2d layer and a Re Lu layer. The first three residual blocks apply a 2x2 maxpooling layer, and the last residual block employs an adaptive pooling to ensure the adaptation of different input scales. The Res Net-12 finally outputs a 640-dimensional embedding. We adopt SGD optimizer with a momentum of 0.9 and a weight decay of 5e-4. We train 100 epochs for all the datasets, with a batch size of 128. The learning rate is set to 0.05 at first and is declined at the 60th and 80th epoch by a factor of 0.1.