Theoretical bounds on estimation error for meta-learning

Authors: James Lucas, Mengye Ren, Irene Raissa KAMENI KAMENI, Toniann Pitassi, Richard Zemel

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our primary contributions can be summarized as follows: We introduce novel lower bounds on minimax risk of parameter estimation in meta-learning. Through these bounds, we compare the relative utility of samples from meta-training tasks and the novel task and emphasize the importance of the relationship between the tasks. We provide novel upper bounds on the error rate for estimation in a hierarchical meta-linear-regression problem, which we verify through an empirical evaluation.
Researcher Affiliation Academia No clear institutional affiliations (university names, company names, or email domains) are provided in the extracted text to allow for classification of author affiliation types.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code to reproduce these plots is provided in the supplementary materials with our submission.
Open Datasets No The paper describes generating synthetic data for 'polynomial regression over inputs in the range [-1, 1]' and 'sinusoid functions by placing a prior over the amplitude and phase'. It does not provide concrete access information (link, DOI, formal citation) for a publicly available or open dataset, but rather describes a data generation process.
Dataset Splits No The paper refers to 'n number of datapoints at training tasks (support set)' and 'k number of datapoints at testing tasks (support set)', as well as 'nq number of datapoints at training tasks (query set)' and 'kq number of datapoints at testing tasks (query set)'. While it defines the amount of data used for training and testing a meta-learner per task, it does not specify explicit train/validation/test dataset splits (e.g., percentages, fixed sample counts) for a static dataset, as the data is generated synthetically.
Hardware Specification No This experiment therefore lasted 20 hours in total. M = 50, n {20, 200}, k {100, 1000}, σ [10 8, 1.5], Mq = 100, eps per batch = 25, train ampl range = [1, 4], train phase range = [0, π/2], val ampl range = [3, 5], val phase range = [0, π/2], inner steps = 5, inner lr = 10 3, meta lr = 10 3
Software Dependencies No The paper mentions using 'MAML algorithm', 'SGD', and 'Adam' for optimization, but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x).
Experiment Setup Yes For all of these experiments we used a fully connected network with 6 layers and 40 hidden units per layer. The network is trained using the MAML algorithm (Finn et al., 2017) with 5 inner steps using SGD with an inner learning rate of 10^-3. We used Adam for the outer loop learning with a learning rate of 10^-3. ... Hyper parameters Description: σ noise at test time. M number of tasks at the training tasks Mq number of tasks at the testing tasks eps per batch episode per batch train ampl range range of amplitude at training train phase range range of phase at training val ampl range range of amplitude at testing val phase range range of phase at testing inner steps number of steps of Maml inner lr learning rate used to optimize parameter of the model meta lr used to optimize parameter of the meta-learner n number of datapoints at training tasks(support set) k number of datapoints at testing tasks (support set) nq number of datapoints at training tasks (query set) . kq number of datapoints at testing tasks (query set).