Stochastic Bandits with ReLU Neural Networks

Authors: Kan Xu, Hamsa Bastani, Surbhi Goel, Osbert Bastani

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our algorithm OFU-Re LU with several benchmarks, including OFUL (Abbasi-Yadkori et al., 2011), which assumes the true model is linear and introduces misspecification errors, and three different versions of Neural UCB (Zhou et al., 2020), i.e., Neural UCB-F, Neural UCBT and Neural UCB-TW. Particularly, Neural UCB-F follows the setup in 7.1 of (Zhou et al., 2020) with m = 20 neurons and two layers; Neural UCB-T assumes the knowledge of the neural network structure of the true reward, i.e., m = k neurons and one layer; and Neural UCB-TW inherits the structure from Neural UCB-T but expands the layer size into m = 2k.We consider the true model of a Re LU structure as in (3), with multiple settings presented in Figure 2.
Researcher Affiliation Academia 1Arizona State University, Arizona, USA 2University of Pennsylvania, Pennsylvania, USA. Correspondence to: Kan Xu <kanxu1@asu.edu>.
Pseudocode Yes Algorithm 1 OFU-Re LU; Algorithm 2 OFU-Re LU+
Open Source Code Yes Source code is available at https://github.com/ kanxu526/Re LUBandit.
Open Datasets No The paper uses synthetically generated data: 'We consider the true model of a Re LU structure as in (3)... The noise follows a normal distribution N(0, 0.01). We randomly draw 1, 000 arms from the unit sphere in each round t and choose an optimal arm from this arm set.' It does not refer to a publicly available or open dataset with access information.
Dataset Splits No The paper describes synthetic data generation for its experiments but does not specify any explicit training, validation, or test dataset splits. The data is generated dynamically in each round ('We randomly draw 1, 000 arms from the unit sphere in each round t').
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments. It mentions 'GPU' in the context of previous work but not for its own setup.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes We consider the true model of a Re LU structure as in (3)... The noise follows a normal distribution N(0, 0.01). We randomly draw 1, 000 arms from the unit sphere in each round t and choose an optimal arm from this arm set. ... For OFUL and OFU-Re LU, we use the theoretically suggested confidence ellipsoid for UCB. Since we do not know the gap ν , we set the length of exploration phase for OFU-Re LU to be 20 for our method. We tune the hyperparameters λ for all the methods.