A Sample Complexity Separation between Non-Convex and Convex Meta-Learning
Authors: Nikunj Saunshi, Yi Zhang, Mikhail Khodak, Sanjeev Arora
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This work shows that convex-case analysis might be insufficient to understand the success of meta-learning, and that even for non-convex models it is important to look inside the optimization black-box, specifically at properties of the optimization trajectory. We construct a simple meta-learning instance that captures the problem of one-dimensional subspace learning. For the convex formulation of linear regression on this instance, we show that the new task sample complexity of any initialization-based meta-learning algorithm is (d), where d is the input dimension. In contrast, for the non-convex formulation of a two layer linear network on the same instance, we show that both Reptile and multi-task representation learning can have new task sample complexity of O(1), demonstrating a separation from convex meta-learning. Crucially, analyses of the training dynamics of these methods reveal that they can meta-learn the correct subspace onto which the data should be projected. |
| Researcher Affiliation | Academia | 1Princeton University, Princeton, New Jersey, USA 2Carnegie Mellon University, Pittsburgh, Pennsylvania, USA 3Institute for Advanced Study, Princeton, New Jersey, USA. |
| Pseudocode | No | The paper describes algorithms in prose (e.g., "Reptile: Starting from (A0, w0), the initialization maintained by the algorithm is sequentially updated...") but does not include formal pseudocode blocks or labeled algorithm figures. |
| Open Source Code | No | The paper does not contain any statement about releasing source code or provide links to a code repository for the methodology described. |
| Open Datasets | No | The paper constructs a theoretical "simple meta-learning instance" for its analysis, rather than using a publicly available dataset with concrete access information. For example, it states: "The meta-learning instance µw is defined as uniform distribution over two tasks w and w for a fixed but unknown vector w 2 Rd." |
| Dataset Splits | No | The paper conducts theoretical analysis and does not involve empirical experiments with training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not describe any specific hardware used to run experiments or analyses. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper focuses on theoretical analysis and does not describe experimental setup details such as hyperparameters or training configurations. |