Learning Neural Contextual Bandits through Perturbed Rewards

Authors: Yiling Jia, Weitong ZHANG, Dongruo Zhou, Quanquan Gu, Hongning Wang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive comparisons with several benchmark contextual bandit algorithms, including two recent neural contextual bandit models, demonstrate the effectiveness and computational efficiency of our proposed neural bandit algorithm. In this section, we empirically evaluate the proposed neural bandit algorithm NPR against several state-of-the-art baselines...
Researcher Affiliation Academia Yiling Jia1, Weitong Zhang2, Dongruo Zhou2, Quanquan Gu2, Hongning Wang1 1Department of Computer Science, University of Virginia 2Department of Computer Science, University of California, Los Angeles.
Pseudocode Yes Algorithm 1 Neural bandit with perturbed reward (NPR)
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes six K-class classification datasets from UCI machine learning repository (Beygelzimer et al., 2011) and two real-world datasets extracted from the social bookmarking web service Delicious and music streaming service Last FM (Wu et al., 2016)
Dataset Splits No The paper describes a hyperparameter grid search process, but does not explicitly provide specific training/test/validation dataset splits with percentages or sample counts for reproducibility.
Hardware Specification Yes We implemented all the algorithms in Py Torch and performed all the experiments on a server equipped with Intel Xeon Gold 6230 2.10GHz CPU, 128G RAM, four NVIDIA Ge Force RTX 2080Ti graphical cards.
Software Dependencies No The paper mentions 'Py Torch' as the implementation framework but does not provide a specific version number or other software dependencies with version details.
Experiment Setup Yes For the neural bandit algorithms, we adopted a 3-layer neural network with m = 64 units in each hidden layer. We did a grid search on the first 1000 rounds for regularization parameter λ over {10 i}4 i=0, and step size η over {10 i}3 i=0. We also searched for the concentration parameter δ so that the exploration parameter ν is equivalently searched over {10 i}4 i=0.