Online Learning in Contextual Bandits using Gated Linear Networks

Authors: Eren Sezener, Marcus Hutter, David Budden, Jianan Wang, Joel Veness

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate GLCB compared to 9 state-of-the-art algorithms that leverage deep neural networks, on a standard benchmark suite of discrete and continuous contextual bandit problems.
Researcher Affiliation Industry Deep Mind esezener@google.com
Pseudocode Yes Table 1: (Algorithm 1) Perform a forward pass and optionally update weights. (Algorithm 2) GLCB-policy applied for T timesteps.
Open Source Code Yes Open source GLN implementations are available at: www.github.com/deepmind/deepmind-research/ .
Open Datasets Yes Each algorithm is evaluated using seven of the ten contextual bandit problems described in [7] four discrete tasks (adult, census, covertype and statlog) adapted from classification problems, and three continuous tasks adapted from regression problems (financial, jester and wheel). A summary of each task is provided in Table 2 (Right).
Dataset Splits No The paper describes an online learning setting and the total number of timesteps/data points used (e.g., 'until t = T = min{5000, |D|}'), but it does not specify explicit training, validation, and test dataset splits with percentages or sample counts in the main text.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for the experiments.
Software Dependencies No The paper mentions 'All models implemented using JAX [23] and the Deep Mind JAX Ecosystem [24, 25, 26, 27]', but does not specify version numbers for JAX or other software libraries.
Experiment Setup Yes We tune two sets of parameters for GLCB using grid search, one for the set of Bernoulli bandit tasks and another for the set of continuous bandit tasks, which we report in the appendix.