An Efficient Algorithm for Deep Stochastic Contextual Bandits

Authors: Tan Zhu, Guannan Liang, Chunjiang Zhu, Haining Li, Jinbo Bi11193-11201

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments have been performed to demonstrate the effectiveness and efficiency of the proposed algorithm on multiple real-world datasets. and Experiments We have performed extensive experiments to confirm the effectiveness and computational efficiency of the proposed method, SSGD-SCB with a DNN reward function.
Researcher Affiliation Academia Tan Zhu, Guannan Liang, Chunjiang Zhu, Haining Li, Jinbo Bi Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA tan.zhu@uconn.edu, guannan.liang@uconn.edu, chunjiang.zhu@uconn.edu, haining.li@uconn.edu, jinbo.bi@uconn.edu
Pseudocode Yes Algorithm 1: SSGD-SCB
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the described methodology. It mentions 'Vowpal Wabbit' as a baseline system, but not for its own implementation.
Open Datasets Yes We use the CIFAR-10 dataset (Simonyan and Zisserman 2014), which has been widely used for benchmarking non-convex optimization algorithms.
Dataset Splits No The paper specifies training and test sets: 'For both CIFAR-10 and CIFAR-10+N data, 50K samples are selected as the training set Dtrain while the rest 10K samples form the test set Dtest.' However, it does not explicitly mention a validation set or its split.
Hardware Specification Yes We implement all the algorithms in Py Torch and test on a server equipped with Intel Xeon Gold 6150 2.7GHz CPU, 192GB RAM, and an NVIDIA Tesla V100 GPU.
Software Dependencies No The paper mentions 'We implement all the algorithms in Py Torch' but does not provide a specific version number for PyTorch or any other software dependencies.
Experiment Setup Yes The reward functions of the above algorithms are modeled by a variant of VGG-11 with batch normalization, which contains 9 weight layers and 9.2 million learnable parameters (see Appendix C for the detailed structure and the hyper-parameter settings).