reproducibilityindex.ai

MELR: Meta-Learning via Modeling Episode-Level Relationships for Few-Shot Learning

Authors: Nanyi Fei, Zhiwu Lu, Tao Xiang, Songfang Huang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments for non-transductive standard FSL on two benchmarks show that our MELR achieves 1.0% 5.0% improvements over the baseline (i.e., Proto Net) used for FSL in our model and outperforms the latest competitors under the same settings.
Researcher Affiliation	Collaboration	Nanyi Fei School of Information, Renmin University of China, Beijing, China feinanyi@ruc.edu.cn Zhiwu Lu Gaoling School of Artiﬁcial Intelligence, Renmin University of China, Beijing, China Beijing Key Laboratory of Big Data Management and Analysis Methods, Beijing, China luzhiwu@ruc.edu.cn Tao Xiang University of Surrey Guildford, Surrey, UK t.xiang@surrey.ac.uk Songfang Huang Alibaba DAMO Academy Hangzhou, China songfang.hsf@alibaba-inc.com
Pseudocode	Yes	Algorithm 1 MELR-based FSL Input: Our MELR model with the set of all parameters Θ The base class sample set Db Hyper-parameters λ, T Output: The learned model 1: for all iteration = 1, 2, , Max Iteration do 2: Randomly sample e(1) and e(2) from Db, satisfying that C(1) e = C(2) e , e(1) e(2) = ; 3: Compute ˆF(1) for e(1) using CEAM with Eq. (2), and obtain ˆF(2) with Eq. (6) similarly. 4: Compute Lfsc(e(1)) and Lfsc(e(2)) with Eq. (9), respectively; 5: Construct ˆQ(1,2) e = ˆQ(1) e ˆQ(2) e based on the two episodes; 6: Determine the teacher episode e(t) and the student e(s) by computing the few-shot classiﬁcation accuracies of the two classiﬁers within e(1) and e(2), respectively; 7: Compute the CECR loss Lcecr(e(t), e(s); T) with Eq. (7); 8: Compute the total loss Ltotal with Eq. (10); 9: Compute the gradients ΘLtotal; 10: Update Θ using stochastic gradient descent; 11: end for 12: return The learned model.
Open Source Code	No	We will release the code and models soon.
Open Datasets	Yes	Two widely-used benchmarks are selected: (1) mini Image Net (Vinyals et al., 2016): It contains 100 classes from ILSVRC-12 (Russakovsky et al., 2015). (2) tiered Image Net (Ren et al., 2018): It is a larger subset of ILSVRC-12, containing 608 classes and 779,165 images in total.
Dataset Splits	Yes	(1) mini Image Net (Vinyals et al., 2016): ... We split it into 64 training classes, 16 validation classes and 20 test classes, as in (Ravi & Larochelle, 2017). (2) tiered Image Net (Ren et al., 2018): ... We split it into 351 training classes, 97 validation classes and 160 test classes, as in (Ren et al., 2018).
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions model backbones like Conv4-64, Conv4-512, and Res Net-12.
Software Dependencies	No	The paper mentions using SGD and Adam optimizers, but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, TensorFlow, or other libraries).
Experiment Setup	Yes	The 5-way 5-shot/1-shot settings are used. Each test episode e(test) = (S(test) e , Q(test) e ) has 5 classes randomly sampled from the test split, with 5 or 1 shots and 15 queries per class. We thus have N = 5, K = 5 or 1, Q = 15 as in previous works. ... For Res Net-12, the stochastic gradient descent (SGD) optimizer is employed with the initial learning rate of 1e-4, the weight decay of 5e-4, and the Nesterov momentum of 0.9. For Conv4-64 and Conv4-512, the Adam optimizer (Kingma & Ba, 2015) is adopted with the initial learning rate of 1e-4. The hyper-parameters λ and T are respectively selected from {0.02, 0.05, 0.1, 0.2} and {16, 32, 64, 128} according to the validation performances of our MELR algorithm (see Appendix A.5 for more details).