Online Learning in Contextual Bandits using Gated Linear Networks
Authors: Eren Sezener, Marcus Hutter, David Budden, Jianan Wang, Joel Veness
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate GLCB compared to 9 state-of-the-art algorithms that leverage deep neural networks, on a standard benchmark suite of discrete and continuous contextual bandit problems. |
| Researcher Affiliation | Industry | Deep Mind esezener@google.com |
| Pseudocode | Yes | Table 1: (Algorithm 1) Perform a forward pass and optionally update weights. (Algorithm 2) GLCB-policy applied for T timesteps. |
| Open Source Code | Yes | Open source GLN implementations are available at: www.github.com/deepmind/deepmind-research/ . |
| Open Datasets | Yes | Each algorithm is evaluated using seven of the ten contextual bandit problems described in [7] four discrete tasks (adult, census, covertype and statlog) adapted from classification problems, and three continuous tasks adapted from regression problems (financial, jester and wheel). A summary of each task is provided in Table 2 (Right). |
| Dataset Splits | No | The paper describes an online learning setting and the total number of timesteps/data points used (e.g., 'until t = T = min{5000, |D|}'), but it does not specify explicit training, validation, and test dataset splits with percentages or sample counts in the main text. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory, or cloud instance types) used for the experiments. |
| Software Dependencies | No | The paper mentions 'All models implemented using JAX [23] and the Deep Mind JAX Ecosystem [24, 25, 26, 27]', but does not specify version numbers for JAX or other software libraries. |
| Experiment Setup | Yes | We tune two sets of parameters for GLCB using grid search, one for the set of Bernoulli bandit tasks and another for the set of continuous bandit tasks, which we report in the appendix. |