Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Neural Contextual Bandits with Deep Representation and Shallow Exploration
Authors: Pan Xu, Zheng Wen, Handong Zhao, Quanquan Gu
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on contextual bandit problems based on real-world datasets, demonstrating a better performance and computational efficiency of Neural-Lin UCB over Lin UCB and existing neural bandits algorithms such as Neural UCB, which well aligns with our theory. |
| Researcher Affiliation | Collaboration | Pan Xu California Institute of Technology EMAIL Zheng Wen Deep Mind EMAIL Handong Zhao Adobe Research EMAIL Quanquan Gu University of California, Los Angeles EMAIL |
| Pseudocode | Yes | Algorithm 1 Deep Representation and Shallow Exploration (Neural-Lin UCB) ... Algorithm 2 Update Weight Parameters with Gradient Descent |
| Open Source Code | No | The paper does not provide a statement or link for open-sourcing the code. |
| Open Datasets | Yes | Specifically, following the experimental setting in Zhou et al. (2020),we use datasets (Shuttle) Statlog, Magic and Covertype from UCI machine learning repository (Dua & Graff, 2017), and the MINST dataset from Le Cun et al. (1998). |
| Dataset Splits | No | The paper mentions using |
| Hardware Specification | Yes | All numerical experiments were run on a workstation with Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz. |
| Software Dependencies | No | The paper mentions using 'Re LU neural network' and 'stochastic gradient decent' but does not specify software versions for libraries like PyTorch, TensorFlow, or scikit-learn. |
| Experiment Setup | Yes | We use a Re LU neural network defined as in (2.3) with L = 2 and m = 100 for the UCI datasets (Statlog, Magic, Covertype). ... We set the time horizon T = 15, 000... We use stochastic gradient decent to optimize the network weights, with a step size ηq =1e-5 and maximum iteration number n = 1, 000. ... the network parameter w is updated every H = 100 rounds... We set λ = 1 and αt = 0.02 for all algorithms, t [T]. |