Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On the Stability and Generalization of Meta-Learning: the Impact of Inner-Levels

Authors: Wenjun Ding, Jingling Liu, Lixing Chen, Xiu Su, Tao Sun, Fan Wu, Zhe Qu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Many real-world experiments support our findings and show the improvement of the new meta-objective function. To validate the impact of Q on the two frameworks, we conducted two simple experiments on the Omniglot dataset [22] using MAML [5] and Meta-Minibatch Prox [7] as examples. Extensive experiments confirm the efficiency of the proposed objective.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Central South University 2Xiangjiang Laboratory 3School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University 4Shanghai Key Laboratory of Integrated Administration Technologies for Information Security 5Big Data Institute, Central South University 6College of Computer, National University of Defense Technology EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 GDF and PDF Algorithm 2 MAML Algorithm 3 FOMAML Algorithm 4 Meta SGD Algorithm 5 i MAML Algorithm 6 Meta-Minibatch Prox Algorithm 7 Fo Mu ML
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: Due to some reasons, we don t provide open access to the code.
Open Datasets Yes Few-shot classification. We follow the standard experimental setup described in [22] using the real-world Omniglot dataset, which comprises 1,623 characters from 50 different alphabets, with each character having 20 instances drawn by different individuals. [22] Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332 1338, 2015.
Dataset Splits Yes For each training task, we set the number of support samples and query samples to ntr = 5 and nts = 5, respectively, while for each test task, we use ntr = 5, nts = 15. For each task, the number of support samples and query samples to ntr = 1 and nts = 5, respectively. For each test task, we use ntr = 1, nts = 15. We fix the training task number m = 100 and generate 10000 new tasks at test time by using the standard library [52].
Hardware Specification No Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [No] Justification: Our experiment has low requirements for configuration.
Software Dependencies No The paper only references a standard library [52] which is Torchmeta, but does not specify a version number for it or any other key software components.
Experiment Setup Yes For each training task, we set the number of support samples and query samples to ntr = 5 and nts = 5, respectively, while for each test task, we use ntr = 5, nts = 15. We employ a 3 3 CNN to align with [5, 12], and use the Cross-Entropy Loss as the loss function. For each task, the number of support samples and query samples to ntr = 1 and nts = 5, respectively. For each test task, we use ntr = 1, nts = 15. In addition, each task is formulated as a 5-way classification problem. In particular, we set Q = 5 for the convex setting and Q = 10 for the non-convex setting.