MELR: Meta-Learning via Modeling Episode-Level Relationships for Few-Shot Learning

Authors: Nanyi Fei, Zhiwu Lu, Tao Xiang, Songfang Huang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments for non-transductive standard FSL on two benchmarks show that our MELR achieves 1.0% 5.0% improvements over the baseline (i.e., Proto Net) used for FSL in our model and outperforms the latest competitors under the same settings.
Researcher Affiliation Collaboration Nanyi Fei School of Information, Renmin University of China, Beijing, China feinanyi@ruc.edu.cn Zhiwu Lu Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China luzhiwu@ruc.edu.cn Tao Xiang University of Surrey Guildford, Surrey, UK t.xiang@surrey.ac.uk Songfang Huang Alibaba DAMO Academy Hangzhou, China songfang.hsf@alibaba-inc.com
Pseudocode Yes Algorithm 1 MELR-based FSL Input: Our MELR model with the set of all parameters Θ The base class sample set Db Hyper-parameters λ, T Output: The learned model 1: for all iteration = 1, 2, , Max Iteration do 2: Randomly sample e(1) and e(2) from Db, satisfying that C(1) e = C(2) e , e(1) e(2) = ; 3: Compute ˆF(1) for e(1) using CEAM with Eq. (2), and obtain ˆF(2) with Eq. (6) similarly. 4: Compute Lfsc(e(1)) and Lfsc(e(2)) with Eq. (9), respectively; 5: Construct ˆQ(1,2) e = ˆQ(1) e ˆQ(2) e based on the two episodes; 6: Determine the teacher episode e(t) and the student e(s) by computing the few-shot classification accuracies of the two classifiers within e(1) and e(2), respectively; 7: Compute the CECR loss Lcecr(e(t), e(s); T) with Eq. (7); 8: Compute the total loss Ltotal with Eq. (10); 9: Compute the gradients ΘLtotal; 10: Update Θ using stochastic gradient descent; 11: end for 12: return The learned model.
Open Source Code No We will release the code and models soon.
Open Datasets Yes Two widely-used benchmarks are selected: (1) mini Image Net (Vinyals et al., 2016): It contains 100 classes from ILSVRC-12 (Russakovsky et al., 2015). (2) tiered Image Net (Ren et al., 2018): It is a larger subset of ILSVRC-12, containing 608 classes and 779,165 images in total.
Dataset Splits Yes (1) mini Image Net (Vinyals et al., 2016): ... We split it into 64 training classes, 16 validation classes and 20 test classes, as in (Ravi & Larochelle, 2017). (2) tiered Image Net (Ren et al., 2018): ... We split it into 351 training classes, 97 validation classes and 160 test classes, as in (Ren et al., 2018).
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions model backbones like Conv4-64, Conv4-512, and Res Net-12.
Software Dependencies No The paper mentions using SGD and Adam optimizers, but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, or other libraries).
Experiment Setup Yes The 5-way 5-shot/1-shot settings are used. Each test episode e(test) = (S(test) e , Q(test) e ) has 5 classes randomly sampled from the test split, with 5 or 1 shots and 15 queries per class. We thus have N = 5, K = 5 or 1, Q = 15 as in previous works. ... For Res Net-12, the stochastic gradient descent (SGD) optimizer is employed with the initial learning rate of 1e-4, the weight decay of 5e-4, and the Nesterov momentum of 0.9. For Conv4-64 and Conv4-512, the Adam optimizer (Kingma & Ba, 2015) is adopted with the initial learning rate of 1e-4. The hyper-parameters λ and T are respectively selected from {0.02, 0.05, 0.1, 0.2} and {16, 32, 64, 128} according to the validation performances of our MELR algorithm (see Appendix A.5 for more details).