reproducibilityindex.ai

Stochastic Gradient Succeeds for Bandits

Authors: Jincheng Mei, Zixin Zhong, Bo Dai, Alekh Agarwal, Csaba Szepesvari, Dale Schuurmans

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Section 7 presents a simulation study to verify the theoretical findings
Researcher Affiliation	Collaboration	1Google Research, Brain Team 2University of Alberta 3Georgia Tech 4Google Research.
Pseudocode	Yes	Algorithm 1 Gradient bandit algorithm (without baselines)
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	No	The mean reward r is random generated in (0, 1)K. For each sampled action at πθt( ), the observed reward is generated as Rt(at) = r(at)+ Zt, where Zt N(0, 1) is Gaussian noise.
Dataset Splits	No	The paper does not provide specific dataset split information (percentages, sample counts, citations to predefined splits, or detailed splitting methodology).
Hardware Specification	No	The paper describes simulation experiments but does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running them.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	The learning rate is η = 0.01. We use adversarial initialization, such that πθ1(a ) < 1/K.