Stochastic Gradient Succeeds for Bandits

Authors: Jincheng Mei, Zixin Zhong, Bo Dai, Alekh Agarwal, Csaba Szepesvari, Dale Schuurmans

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Section 7 presents a simulation study to verify the theoretical findings
Researcher Affiliation Collaboration 1Google Research, Brain Team 2University of Alberta 3Georgia Tech 4Google Research.
Pseudocode Yes Algorithm 1 Gradient bandit algorithm (without baselines)
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets No The mean reward r is random generated in (0, 1)K. For each sampled action at πθt( ), the observed reward is generated as Rt(at) = r(at)+ Zt, where Zt N(0, 1) is Gaussian noise.
Dataset Splits No The paper does not provide specific dataset split information (percentages, sample counts, citations to predefined splits, or detailed splitting methodology).
Hardware Specification No The paper describes simulation experiments but does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running them.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes The learning rate is η = 0.01. We use adversarial initialization, such that πθ1(a ) < 1/K.