Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Stochastic Bandits with ReLU Neural Networks

Authors: Kan Xu, Hamsa Bastani, Surbhi Goel, Osbert Bastani

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare our algorithm OFU-Re LU with several benchmarks, including OFUL (Abbasi-Yadkori et al., 2011), which assumes the true model is linear and introduces misspecification errors, and three different versions of Neural UCB (Zhou et al., 2020), i.e., Neural UCB-F, Neural UCBT and Neural UCB-TW. Particularly, Neural UCB-F follows the setup in 7.1 of (Zhou et al., 2020) with m = 20 neurons and two layers; Neural UCB-T assumes the knowledge of the neural network structure of the true reward, i.e., m = k neurons and one layer; and Neural UCB-TW inherits the structure from Neural UCB-T but expands the layer size into m = 2k.We consider the true model of a Re LU structure as in (3), with multiple settings presented in Figure 2.
Researcher Affiliation	Academia	1Arizona State University, Arizona, USA 2University of Pennsylvania, Pennsylvania, USA. Correspondence to: Kan Xu <EMAIL>.
Pseudocode	Yes	Algorithm 1 OFU-Re LU; Algorithm 2 OFU-Re LU+
Open Source Code	Yes	Source code is available at https://github.com/ kanxu526/Re LUBandit.
Open Datasets	No	The paper uses synthetically generated data: 'We consider the true model of a Re LU structure as in (3)... The noise follows a normal distribution N(0, 0.01). We randomly draw 1, 000 arms from the unit sphere in each round t and choose an optimal arm from this arm set.' It does not refer to a publicly available or open dataset with access information.
Dataset Splits	No	The paper describes synthetic data generation for its experiments but does not specify any explicit training, validation, or test dataset splits. The data is generated dynamically in each round ('We randomly draw 1, 000 arms from the unit sphere in each round t').
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments. It mentions 'GPU' in the context of previous work but not for its own setup.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	We consider the true model of a Re LU structure as in (3)... The noise follows a normal distribution N(0, 0.01). We randomly draw 1, 000 arms from the unit sphere in each round t and choose an optimal arm from this arm set. ... For OFUL and OFU-Re LU, we use the theoretically suggested confidence ellipsoid for UCB. Since we do not know the gap ν , we set the length of exploration phase for OFU-Re LU to be 20 for our method. We tune the hyperparameters λ for all the methods.