Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and Personalized Federated Learning

Authors: Bokun Wang, Zhuoning Yuan, Yiming Ying, Tianbao Yang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of our proposed algorithms, MOMLv1, MOMLv2 and Local MOML, on sinewave regression and one-shot classiﬁcation tasks in the single-node setting. Furthermore, we demonstrate the eﬀectiveness of Local MOML in the simulated federated learning setting for the image classiﬁcation task.
Researcher Affiliation	Academia	Bokun Wang EMAIL Department of Computer Science and Engineering Texas A&M University College Station, TX 77843, USA Zhuoning Yuan EMAIL Department of Computer Science The University of Iowa Iowa City, IA 52242, USA Yiming Ying EMAIL Department of Mathematics and Statistics University at Albany Albany, NY 12222, USA Tianbao Yang EMAIL Department of Computer Science and Engineering Texas A&M University College Station, TX 77843, USA
Pseudocode	Yes	Algorithm 1 MOMLv1, Algorithm 2 MOMLv2, Algorithm 3 Local MOML
Open Source Code	Yes	Interested readers can access our code at https://github.com/bokun-wang/moml.
Open Datasets	Yes	We evaluate the performance of our proposed algorithms, MOMLv1, MOMLv2 and Local MOML, on sinewave regression and one-shot classiﬁcation tasks in the single-node setting... on the sinewave regression problem (Finn et al., 2017)... on the Omniglot and CIFAR-100 datasets... We consider three data sets, MNIST, CIFAR-10, and CIFAR-100.
Dataset Splits	Yes	We generate 25 tasks for training in total... The training and validation data of each task are randomly sampled in an online manner... adapted to 5 randomly sampled unseen tasks... 10 test data points... another 100 data points from the unseen task... For the Omniglot dataset, we randomly select 25 tasks for training and 10 tasks for testing... For the CIFAR-100 dataset, we randomly select 17 tasks for training and 3 tasks for testing... we distribute the training data between N = 50 clients (tasks)... Similarly, we divide the test data among the clients with the same distribution as the one for the training data. We set a = 68 for constructing the distributed training sets of MNIST, CIFAR-10, and CIFAR-100, and set a = 34 for constructing the test sets of MNIST and CIFAR-10 and a = 15 for constructing the test sets of CIFAR-100.
Hardware Specification	No	We conduct experiments on four GPUs to mimic the cross-device federated learning setting, where all 50 tasks are distributed to the four GPUs roughly evenly. The paper mentions using 'four GPUs' but does not specify the exact GPU models, memory, or any other hardware details.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	The inner step size α is set to 0.01 for all algorithms. The outer step size η is decayed 10 times at 75% of the total iterations and its initial value is tuned for the algorithms separately by grid search in {0.1, 0.05, 0.01, 0.005, 0.001}. We also tune β for MOMLv1, MOMLv2 and Local MOML. It turns out that β = 0.3 and β = 0.5 work reasonably well for MOMLv1 and Local MOML while β = 0.1 is good for MOMLv2. For Local MOML, we set the size of the initial number of samples K0 of each round to be 2 times K and H = 5. We use α = 0.001 and the step size for the considered algorithms is tuned in a range similar to before. For all algorithms, we consider two settings of H = 4 and H = 10. The minibatch size at every iteration (including the initial one at each round) is set to 5, that is, K = 5, K0 = 5. We tune the β in a range [0.1, 0.9], and run a total of 10000 iterations. For p Fed Me, we tune its hyperparameter λ = 100 and set the number of steps to be 50 to solve the sub-problem accurately enough.