Provable Meta-Learning of Linear Representations

Authors: Nilesh Tripuraneni, Chi Jin, Michael Jordan

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6. Simulations We complement our theoretical analysis with a series of numerical experiments highlighting the benefits (and lim-its) of meta-learning8. For the purposes of feature learning we compare the performance of the method-of-moments estimator in Algorithm 1 vs. directly optimizing the objective in (4). Additional details on our set-up are provided in Appendix G. We construct problem instances by generating Gaussian covariates and noise as xi N(0, Id), ϵi N(0, 1), and the tasks and features used for the first-stage feature estimation as αi 1 r N(0, Ir), with B generated as a (uniform) random r-dimensional subspace of Rd. In all our experiments we generate an equal number of samples nt for each of the t tasks, so n1 = t nt. In the second stage we generate a new, (t + 1)st task instance using the same feature estimate B used in the first stage and otherwise generate n2 samples, with the covariates, noise and αt+1 constructed as before. Throughout this section we refer to features learned via a first-order gradient method as LF-FO and the corresponding meta-learned regression parameter on a new task by meta-LR-FO. We use LF-Mo M and meta-LR-Mo M to refer to the same quantities save with the feature estimate learned via the method-of-moments estimator. We also use LR to refer to the baseline linear regression estimator on a new task which only uses data generated from that task.
Researcher Affiliation Academia Nilesh Tripuraneni 1 Chi Jin 2 Michael I. Jordan 1 1Department of EECS, University of California, Berkeley 2Department of Electrical Engineering, Princeton University.
Pseudocode Yes Algorithm 1 Mo M Estimator for Learning Linear Features and Algorithm 2 Linear Regression for Learning a New Task with a Feature Estimate
Open Source Code Yes An open-source Python implementation to reproduce our experiments can be found at https://github.com/ nileshtrip/MTL.
Open Datasets No We construct problem instances by generating Gaussian covariates and noise as xi N(0, Id), ϵi N(0, 1), and the tasks and features used for the first-stage feature estimation as αi 1 r N(0, Ir), with B generated as a (uniform) random r-dimensional subspace of Rd. In all our experiments we generate an equal number of samples nt for each of the t tasks, so n1 = t nt. In the second stage we generate a new, (t + 1)st task instance using the same feature estimate B used in the first stage and otherwise generate n2 samples, with the covariates, noise and αt+1 constructed as before. (No publicly available dataset is used, rather data is simulated.)
Dataset Splits No We construct problem instances by generating Gaussian covariates and noise as xi N(0, Id), ϵi N(0, 1), and the tasks and features used for the first-stage feature estimation as αi 1 r N(0, Ir), with B generated as a (uniform) random r-dimensional subspace of Rd. In all our experiments we generate an equal number of samples nt for each of the t tasks, so n1 = t nt. In the second stage we generate a new, (t + 1)st task instance using the same feature estimate B used in the first stage and otherwise generate n2 samples, with the covariates, noise and αt+1 constructed as before. (The paper simulates data and uses "meta-train" and "meta-test" phases, but does not explicitly detail train/validation/test splits with percentages or counts for a pre-existing dataset.)
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models) used for running the simulations.
Software Dependencies No The paper mentions 'An open-source Python implementation' but does not provide specific version numbers for Python or any software libraries used.
Experiment Setup Yes We construct problem instances by generating Gaussian covariates and noise as xi N(0, Id), ϵi N(0, 1), and the tasks and features used for the first-stage feature estimation as αi 1 r N(0, Ir), with B generated as a (uniform) random r-dimensional subspace of Rd. In all our experiments we generate an equal number of samples nt for each of the t tasks, so n1 = t nt. In the second stage we generate a new, (t + 1)st task instance using the same feature estimate B used in the first stage and otherwise generate n2 samples, with the covariates, noise and αt+1 constructed as before. Throughout this section we refer to features learned via a first-order gradient method as LF-FO and the corresponding meta-learned regression parameter on a new task by meta-LR-FO. We use LF-Mo M and meta-LR-Mo M to refer to the same quantities save with the feature estimate learned via the method-of-moments estimator. We also use LR to refer to the baseline linear regression estimator on a new task which only uses data generated from that task. and We begin by considering a challenging setting for feature learning where d = 100, r = 5, but nt = 5 for varying numbers of tasks t. and We now turn to the more interesting use cases where metalearning is a powerful tool. We consider a setting where d = 100, r = 5, and nt = 25 for varying numbers of tasks t. However, now we consider a new, unseen task where data is scarce: n2 = 25 < d. and Finally, we consider an instance where d = 100, r = 5, t = 20, and n2 = 50 with varying numbers of training points nt per task.