MELR: Meta-Learning via Modeling Episode-Level Relationships for Few-Shot Learning
Authors: Nanyi Fei, Zhiwu Lu, Tao Xiang, Songfang Huang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments for non-transductive standard FSL on two benchmarks show that our MELR achieves 1.0% 5.0% improvements over the baseline (i.e., Proto Net) used for FSL in our model and outperforms the latest competitors under the same settings. |
| Researcher Affiliation | Collaboration | Nanyi Fei School of Information, Renmin University of China, Beijing, China feinanyi@ruc.edu.cn Zhiwu Lu Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China luzhiwu@ruc.edu.cn Tao Xiang University of Surrey Guildford, Surrey, UK t.xiang@surrey.ac.uk Songfang Huang Alibaba DAMO Academy Hangzhou, China songfang.hsf@alibaba-inc.com |
| Pseudocode | Yes | Algorithm 1 MELR-based FSL Input: Our MELR model with the set of all parameters Θ The base class sample set Db Hyper-parameters λ, T Output: The learned model 1: for all iteration = 1, 2, , Max Iteration do 2: Randomly sample e(1) and e(2) from Db, satisfying that C(1) e = C(2) e , e(1) e(2) = ; 3: Compute ˆF(1) for e(1) using CEAM with Eq. (2), and obtain ˆF(2) with Eq. (6) similarly. 4: Compute Lfsc(e(1)) and Lfsc(e(2)) with Eq. (9), respectively; 5: Construct ˆQ(1,2) e = ˆQ(1) e ˆQ(2) e based on the two episodes; 6: Determine the teacher episode e(t) and the student e(s) by computing the few-shot classification accuracies of the two classifiers within e(1) and e(2), respectively; 7: Compute the CECR loss Lcecr(e(t), e(s); T) with Eq. (7); 8: Compute the total loss Ltotal with Eq. (10); 9: Compute the gradients ΘLtotal; 10: Update Θ using stochastic gradient descent; 11: end for 12: return The learned model. |
| Open Source Code | No | We will release the code and models soon. |
| Open Datasets | Yes | Two widely-used benchmarks are selected: (1) mini Image Net (Vinyals et al., 2016): It contains 100 classes from ILSVRC-12 (Russakovsky et al., 2015). (2) tiered Image Net (Ren et al., 2018): It is a larger subset of ILSVRC-12, containing 608 classes and 779,165 images in total. |
| Dataset Splits | Yes | (1) mini Image Net (Vinyals et al., 2016): ... We split it into 64 training classes, 16 validation classes and 20 test classes, as in (Ravi & Larochelle, 2017). (2) tiered Image Net (Ren et al., 2018): ... We split it into 351 training classes, 97 validation classes and 160 test classes, as in (Ren et al., 2018). |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions model backbones like Conv4-64, Conv4-512, and Res Net-12. |
| Software Dependencies | No | The paper mentions using SGD and Adam optimizers, but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, or other libraries). |
| Experiment Setup | Yes | The 5-way 5-shot/1-shot settings are used. Each test episode e(test) = (S(test) e , Q(test) e ) has 5 classes randomly sampled from the test split, with 5 or 1 shots and 15 queries per class. We thus have N = 5, K = 5 or 1, Q = 15 as in previous works. ... For Res Net-12, the stochastic gradient descent (SGD) optimizer is employed with the initial learning rate of 1e-4, the weight decay of 5e-4, and the Nesterov momentum of 0.9. For Conv4-64 and Conv4-512, the Adam optimizer (Kingma & Ba, 2015) is adopted with the initial learning rate of 1e-4. The hyper-parameters λ and T are respectively selected from {0.02, 0.05, 0.1, 0.2} and {16, 32, 64, 128} according to the validation performances of our MELR algorithm (see Appendix A.5 for more details). |