Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Impact of Representation Learning in Linear Bandits
Authors: Jiaqi Yang, Wei Hu, Jason D. Lee, Simon Shaolei Du
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also present experiments on synthetic and realworld data to illustrate our theoretical findings and demonstrate the effectiveness of our proposed algorithms. |
| Researcher Affiliation | Academia | Jiaqi Yang Tsinghua University EMAIL Wei Hu Princeton University EMAIL Jason D. Lee Princeton University EMAIL Simon S. Du University of Washington EMAIL |
| Pseudocode | Yes | Algorithm 1: MLin Greedy: Multi-task Linear Bandit with Finite Actions; Algorithm 2: E2TC: Explore-Explore-Then-Commit |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We create a linear bandits problem on MNIST data (Le Cun et al., 2010) |
| Dataset Splits | No | The paper mentions 'N = 10000' total rounds but does not specify train, validation, or test dataset splits for the experiments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper does not provide any specific software names with version numbers, nor any self-contained solvers or specialized packages with versions. |
| Experiment Setup | Yes | We fix K = 5 and N = 10000 for all simulations on finite-action setting. We vary k, d and T to compare Algorithm 1 and the naive algorithm. We emphasize that the y-axis in our figures corresponds to the regret per task, which is defined as RN,T /T. We fix K = 5, N = 10000. We create a linear bandits problem on MNIST data (Le Cun et al., 2010) to illustrate the effectiveness of our algorithm on real-world data. We fix K = 2 and create T = 10 2 tasks and each task is parameterized by a pair (i, j), where 0 i < j 9. We consider k = 2, 3 in our experiments. The noise εn,t N(0, 1) are i.i.d. Gaussian random variables. To verify our theoretical results, we consider a hyper-parameter c {0.5, 1, 1.5, 2}. For each c, we run E2TC with N1 = dck q T and N2 = k N. |