Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization

Authors: Thanh Nguyen-Tang, Sunil Gupta, A. Tuan Nguyen, Svetha Venkatesh

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we demonstrate the empirical effectiveness of our method in a range of synthetic and real-world OPL problems.
Researcher Affiliation Academia Thanh Nguyen-Tang Applied AI Institute Deakin University Sunil Gupta Applied AI Institute Deakin University A. Tuan Nguyen Department of Engineering Science University of Oxford Svetha Venkatesh Applied AI Institute Deakin University
Pseudocode Yes Algorithm 1 Neura LCB (page 3); Algorithm 2 Lin LCB, Algorithm 3 Kern LCB, Algorithm 4 Neural Lin LCB, Algorithm 5 Neural Lin Greedy, Algorithm 6 Neural Greedy, Algorithm 7 Neura LCB (B-mode) (Appendix D)
Open Source Code Yes Our code repos: https://github.com/thanhnguyentang/offline_neural_bandits.
Open Datasets Yes We evaluate the algorithms on real-world datasets from UCI Machine Learning Repository (Dua & Graff, 2017): Mushroom, Statlog, and Adult, and MNIST (Le Cun et al., 1998).
Dataset Splits No The paper does not explicitly mention validation dataset splits or their sizes/percentages, though it describes test contexts. It states: 'In each run, we randomly sample nte = 10,000 contexts from ρ and use this same test contexts to approximate the expected sub-optimality of each algorithm.'
Hardware Specification No The paper does not specify the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments.
Software Dependencies No The paper mentions using 'Adam optimizer (Kingma & Ba, 2014)' and 'layer normalization (Ba et al., 2016)' but does not specify version numbers for these or any other software dependencies, such as Python or specific libraries.
Experiment Setup Yes Hyperparameters. We fix λ = 0.1 for all algorithms. For Neura LCB, we set βt = β, and for Neura LCB, Lin LCB, Kern LCB, and Neural Lin LCB, we do grid search over {0.01, 0.05, 0.1, 1, 5, 10} for the uncertainty parameter β. For Kern LCB, we use the radius basis function (RBF) kernel with parameter σ and do grid search over {0.1, 1, 10} for σ. For Neura LCB and Neural Greedy, we use Adam optimizer (Kingma & Ba, 2014) with learning rate η grid-searched over {0.0001, 0.001} and set the l2-regularized parameter to 0.0001. For Neura LCB, for each Dt, we use ˆπt as its final returned policy instead of averaging over all policies {ˆπτ}t τ=1. Moreover, we grid search Neura LCB and Neural Greedy over two training modes, namely {S-mode, B-mode} where at each iteration t, Smode updates the neural network for one step of SGD (one step of Adam update in practice) on one single data point (xt, at, rt) while B-mode updates the network for 100 steps of SGD on a random batch of size 50 of data Dt (details at Algorithm 7). [...] For Neural Lin LCB, Neural Lin Greedy, Neura LCB, and Neural Greedy, we use the same network architecture with L = 2 and add layer normalization (Ba et al., 2016) in the hidden layers. The network width m will be specified later based on datasets.