Neural Contextual Bandits with UCB-based Exploration

Authors: Dongruo Zhou, Lihong Li, Quanquan Gu

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate Neural UCB empirically and compare it with seven representative baselines: (1) Lin UCB, which is also based on UCB but adopts a linear representation; (2) GLMUCB (Filippi et al., 2010), which applies a nonlinear link function over a linear function; (3) Kernel UCB (Valko et al., 2013), a kernelised UCB algorithm which makes use of a predefined kernel function; (4) Bootstrapped NN (Efron, 1982; Riquelme et al., 2018), which simultaneously trains a set of neural networks using bootstrapped samples and at every round chooses an action based on the prediction of a randomly picked model; (5) Neural ϵ-Greedy, which replaces the UCB-based exploration in Algorithm 1 by ϵ-greedy; (6) Neural UCB0, as described in Section 3; and (7) Neural ϵ-Greedy0, same as Neural UCB0 but with ϵ-greedy exploration. We use the cumulative regret as the performance metric. Figures 1 and 2 show the cumulative regret of all algorithms.
Researcher Affiliation Collaboration 1Department of Computer Science, University of California, Los Angeles, CA 90095, USA 2Google Research, USA.
Pseudocode Yes Algorithm 1 Neural UCB; Algorithm 2 Train NN
Open Source Code No The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We evaluate our algorithms on real-world datasets from the UCI Machine Learning Repository (Dua & Graff, 2017): covertype, magic, and statlog. We also evaluate our algorithms on mnist dataset (Le Cun et al., 1998).
Dataset Splits No The paper describes converting classification datasets into K-armed contextual bandits and reshuffling the order of contexts, but does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) in the conventional sense.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as exact GPU/CPU models, memory, or cloud instance types.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes For all five neural algorithms, we choose a two-layer neural network f(x; θ) = m W2σ(W1x) with network width m = 20, where θ = [vec(W1) , vec(W2) ] Rp and p = md + m = 420.1 Moreover, we set γt = γ in Neural UCB, and do a grid search over {0.01, 0.1, 1, 10}. For Neural ϵ-Greedy and Neural ϵ-Greedy0, we do a grid search for ϵ over {0.001, 0.01, 0.1, 0.2}. For Bootstrapped NN, we follow Riquelme et al. (2018) to set the number of models to be 10 and the transition probability to be 0.8. To accelerate the training process, for Bootstrapped NN, Neural UCB and Neural ϵ-Greedy, we update the parameter θt by Train NN every 50 rounds. We use stochastic gradient descent with batch size 50, J = t at round t, and do a grid search for step size η over {0.001, 0.01, 0.1}.