Meta-learning with Stochastic Linear Bandits
Authors: Leonardo Cella, Alessandro Lazaric, Massimiliano Pontil
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation. Finally, in Section 6 we compare experimentally the proposed methods with respect to the standard OFUL algorithm on both synthetic and real data. |
| Researcher Affiliation | Collaboration | 1University of Milan 2Istituto Italiano di Tecnologia 3Facebook AI Research. |
| Pseudocode | Yes | Algorithm 1 Within Task Algorithm: BIAS-OFUL; Algorithm 2 Meta-Algorithm: Estimating bhλ |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described, such as a specific repository link, explicit code release statement, or code in supplementary materials. |
| Open Datasets | Yes | The first dataset we considered is extracted from the music streaming service Last.fm (Cantador et al., 2011). Here we consider the Movielens data (Harper & Konstan, 2015). |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | Each task consists of T = 50 rounds, in which we have K = 5 arms of size d = 20. The regularization hyper-parameter λ was selected over a logarithmic scale. Furthermore, in order to let the tasks be simpler, we reduced the variance of the noisy components affecting rewards to 0.1. |