Neural Contextual Bandits with Deep Representation and Shallow Exploration

Authors: Pan Xu, Zheng Wen, Handong Zhao, Quanquan Gu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on contextual bandit problems based on real-world datasets, demonstrating a better performance and computational efficiency of Neural-Lin UCB over Lin UCB and existing neural bandits algorithms such as Neural UCB, which well aligns with our theory.
Researcher Affiliation Collaboration Pan Xu California Institute of Technology panxu@caltech.edu Zheng Wen Deep Mind zhengwen@google.com Handong Zhao Adobe Research hazhao@adobe.com Quanquan Gu University of California, Los Angeles qgu@cs.ucla.edu
Pseudocode Yes Algorithm 1 Deep Representation and Shallow Exploration (Neural-Lin UCB) ... Algorithm 2 Update Weight Parameters with Gradient Descent
Open Source Code No The paper does not provide a statement or link for open-sourcing the code.
Open Datasets Yes Specifically, following the experimental setting in Zhou et al. (2020),we use datasets (Shuttle) Statlog, Magic and Covertype from UCI machine learning repository (Dua & Graff, 2017), and the MINST dataset from Le Cun et al. (1998).
Dataset Splits No The paper mentions using
Hardware Specification Yes All numerical experiments were run on a workstation with Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz.
Software Dependencies No The paper mentions using 'Re LU neural network' and 'stochastic gradient decent' but does not specify software versions for libraries like PyTorch, TensorFlow, or scikit-learn.
Experiment Setup Yes We use a Re LU neural network defined as in (2.3) with L = 2 and m = 100 for the UCI datasets (Statlog, Magic, Covertype). ... We set the time horizon T = 15, 000... We use stochastic gradient decent to optimize the network weights, with a step size ηq =1e-5 and maximum iteration number n = 1, 000. ... the network parameter w is updated every H = 100 rounds... We set λ = 1 and αt = 0.02 for all algorithms, t [T].