Impact of Representation Learning in Linear Bandits
Authors: Jiaqi Yang, Wei Hu, Jason D. Lee, Simon Shaolei Du
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also present experiments on synthetic and realworld data to illustrate our theoretical findings and demonstrate the effectiveness of our proposed algorithms. |
| Researcher Affiliation | Academia | Jiaqi Yang Tsinghua University yangjq17@gmail.com Wei Hu Princeton University huwei@cs.princeton.edu Jason D. Lee Princeton University jasonlee@princeton.edu Simon S. Du University of Washington ssdu@cs.washington.edu |
| Pseudocode | Yes | Algorithm 1: MLin Greedy: Multi-task Linear Bandit with Finite Actions; Algorithm 2: E2TC: Explore-Explore-Then-Commit |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We create a linear bandits problem on MNIST data (Le Cun et al., 2010) |
| Dataset Splits | No | The paper mentions 'N = 10000' total rounds but does not specify train, validation, or test dataset splits for the experiments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper does not provide any specific software names with version numbers, nor any self-contained solvers or specialized packages with versions. |
| Experiment Setup | Yes | We fix K = 5 and N = 10000 for all simulations on finite-action setting. We vary k, d and T to compare Algorithm 1 and the naive algorithm. We emphasize that the y-axis in our figures corresponds to the regret per task, which is defined as RN,T /T. We fix K = 5, N = 10000. We create a linear bandits problem on MNIST data (Le Cun et al., 2010) to illustrate the effectiveness of our algorithm on real-world data. We fix K = 2 and create T = 10 2 tasks and each task is parameterized by a pair (i, j), where 0 i < j 9. We consider k = 2, 3 in our experiments. The noise εn,t N(0, 1) are i.i.d. Gaussian random variables. To verify our theoretical results, we consider a hyper-parameter c {0.5, 1, 1.5, 2}. For each c, we run E2TC with N1 = dck q T and N2 = k N. |