Stochastic Gradient Succeeds for Bandits
Authors: Jincheng Mei, Zixin Zhong, Bo Dai, Alekh Agarwal, Csaba Szepesvari, Dale Schuurmans
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Section 7 presents a simulation study to verify the theoretical findings |
| Researcher Affiliation | Collaboration | 1Google Research, Brain Team 2University of Alberta 3Georgia Tech 4Google Research. |
| Pseudocode | Yes | Algorithm 1 Gradient bandit algorithm (without baselines) |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | No | The mean reward r is random generated in (0, 1)K. For each sampled action at πθt( ), the observed reward is generated as Rt(at) = r(at)+ Zt, where Zt N(0, 1) is Gaussian noise. |
| Dataset Splits | No | The paper does not provide specific dataset split information (percentages, sample counts, citations to predefined splits, or detailed splitting methodology). |
| Hardware Specification | No | The paper describes simulation experiments but does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running them. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | The learning rate is η = 0.01. We use adversarial initialization, such that πθ1(a ) < 1/K. |