reproducibilityindex.ai

Learning Neural Contextual Bandits through Perturbed Rewards

Authors: Yiling Jia, Weitong ZHANG, Dongruo Zhou, Quanquan Gu, Hongning Wang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive comparisons with several benchmark contextual bandit algorithms, including two recent neural contextual bandit models, demonstrate the effectiveness and computational efﬁciency of our proposed neural bandit algorithm. In this section, we empirically evaluate the proposed neural bandit algorithm NPR against several state-of-the-art baselines...
Researcher Affiliation	Academia	Yiling Jia1, Weitong Zhang2, Dongruo Zhou2, Quanquan Gu2, Hongning Wang1 1Department of Computer Science, University of Virginia 2Department of Computer Science, University of California, Los Angeles.
Pseudocode	Yes	Algorithm 1 Neural bandit with perturbed reward (NPR)
Open Source Code	No	The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets	Yes	six K-class classiﬁcation datasets from UCI machine learning repository (Beygelzimer et al., 2011) and two real-world datasets extracted from the social bookmarking web service Delicious and music streaming service Last FM (Wu et al., 2016)
Dataset Splits	No	The paper describes a hyperparameter grid search process, but does not explicitly provide specific training/test/validation dataset splits with percentages or sample counts for reproducibility.
Hardware Specification	Yes	We implemented all the algorithms in Py Torch and performed all the experiments on a server equipped with Intel Xeon Gold 6230 2.10GHz CPU, 128G RAM, four NVIDIA Ge Force RTX 2080Ti graphical cards.
Software Dependencies	No	The paper mentions 'Py Torch' as the implementation framework but does not provide a specific version number or other software dependencies with version details.
Experiment Setup	Yes	For the neural bandit algorithms, we adopted a 3-layer neural network with m = 64 units in each hidden layer. We did a grid search on the ﬁrst 1000 rounds for regularization parameter λ over {10 i}4 i=0, and step size η over {10 i}3 i=0. We also searched for the concentration parameter δ so that the exploration parameter ν is equivalently searched over {10 i}4 i=0.