Neural Contextual Bandits with UCB-based Exploration
Authors: Dongruo Zhou, Lihong Li, Quanquan Gu
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate Neural UCB empirically and compare it with seven representative baselines: (1) Lin UCB, which is also based on UCB but adopts a linear representation; (2) GLMUCB (Filippi et al., 2010), which applies a nonlinear link function over a linear function; (3) Kernel UCB (Valko et al., 2013), a kernelised UCB algorithm which makes use of a predefined kernel function; (4) Bootstrapped NN (Efron, 1982; Riquelme et al., 2018), which simultaneously trains a set of neural networks using bootstrapped samples and at every round chooses an action based on the prediction of a randomly picked model; (5) Neural ϵ-Greedy, which replaces the UCB-based exploration in Algorithm 1 by ϵ-greedy; (6) Neural UCB0, as described in Section 3; and (7) Neural ϵ-Greedy0, same as Neural UCB0 but with ϵ-greedy exploration. We use the cumulative regret as the performance metric. Figures 1 and 2 show the cumulative regret of all algorithms. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, University of California, Los Angeles, CA 90095, USA 2Google Research, USA. |
| Pseudocode | Yes | Algorithm 1 Neural UCB; Algorithm 2 Train NN |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We evaluate our algorithms on real-world datasets from the UCI Machine Learning Repository (Dua & Graff, 2017): covertype, magic, and statlog. We also evaluate our algorithms on mnist dataset (Le Cun et al., 1998). |
| Dataset Splits | No | The paper describes converting classification datasets into K-armed contextual bandits and reshuffling the order of contexts, but does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts) in the conventional sense. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as exact GPU/CPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | For all five neural algorithms, we choose a two-layer neural network f(x; θ) = m W2σ(W1x) with network width m = 20, where θ = [vec(W1) , vec(W2) ] Rp and p = md + m = 420.1 Moreover, we set γt = γ in Neural UCB, and do a grid search over {0.01, 0.1, 1, 10}. For Neural ϵ-Greedy and Neural ϵ-Greedy0, we do a grid search for ϵ over {0.001, 0.01, 0.1, 0.2}. For Bootstrapped NN, we follow Riquelme et al. (2018) to set the number of models to be 10 and the transition probability to be 0.8. To accelerate the training process, for Bootstrapped NN, Neural UCB and Neural ϵ-Greedy, we update the parameter θt by Train NN every 50 rounds. We use stochastic gradient descent with batch size 50, J = t at round t, and do a grid search for step size η over {0.001, 0.01, 0.1}. |