Meta-learning with Stochastic Linear Bandits

Authors: Leonardo Cella, Alessandro Lazaric, Massimiliano Pontil

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation. Finally, in Section 6 we compare experimentally the proposed methods with respect to the standard OFUL algorithm on both synthetic and real data.
Researcher Affiliation Collaboration 1University of Milan 2Istituto Italiano di Tecnologia 3Facebook AI Research.
Pseudocode Yes Algorithm 1 Within Task Algorithm: BIAS-OFUL; Algorithm 2 Meta-Algorithm: Estimating bhλ
Open Source Code No The paper does not provide concrete access to source code for the methodology described, such as a specific repository link, explicit code release statement, or code in supplementary materials.
Open Datasets Yes The first dataset we considered is extracted from the music streaming service Last.fm (Cantador et al., 2011). Here we consider the Movielens data (Harper & Konstan, 2015).
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes Each task consists of T = 50 rounds, in which we have K = 5 arms of size d = 20. The regularization hyper-parameter λ was selected over a logarithmic scale. Furthermore, in order to let the tasks be simpler, we reduced the variance of the noisy components affecting rewards to 0.1.