reproducibilityindex.ai

A Sample Complexity Separation between Non-Convex and Convex Meta-Learning

Authors: Nikunj Saunshi, Yi Zhang, Mikhail Khodak, Sanjeev Arora

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	This work shows that convex-case analysis might be insufﬁcient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, speciﬁcally at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any initialization-based meta-learning algorithm is (d), where d is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of O(1), demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected.
Researcher Affiliation	Academia	1Princeton University, Princeton, New Jersey, USA 2Carnegie Mellon University, Pittsburgh, Pennsylvania, USA 3Institute for Advanced Study, Princeton, New Jersey, USA.
Pseudocode	No	The paper describes algorithms in prose (e.g., "Reptile: Starting from (A0, w0), the initialization maintained by the algorithm is sequentially updated...") but does not include formal pseudocode blocks or labeled algorithm figures.
Open Source Code	No	The paper does not contain any statement about releasing source code or provide links to a code repository for the methodology described.
Open Datasets	No	The paper constructs a theoretical "simple meta-learning instance" for its analysis, rather than using a publicly available dataset with concrete access information. For example, it states: "The meta-learning instance µw is deﬁned as uniform distribution over two tasks w and w for a ﬁxed but unknown vector w 2 Rd."
Dataset Splits	No	The paper conducts theoretical analysis and does not involve empirical experiments with training, validation, or test dataset splits.
Hardware Specification	No	The paper does not describe any specific hardware used to run experiments or analyses.
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers.
Experiment Setup	No	The paper focuses on theoretical analysis and does not describe experimental setup details such as hyperparameters or training configurations.