An Efficient Algorithm for Deep Stochastic Contextual Bandits
Authors: Tan Zhu, Guannan Liang, Chunjiang Zhu, Haining Li, Jinbo Bi11193-11201
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments have been performed to demonstrate the effectiveness and efficiency of the proposed algorithm on multiple real-world datasets. and Experiments We have performed extensive experiments to confirm the effectiveness and computational efficiency of the proposed method, SSGD-SCB with a DNN reward function. |
| Researcher Affiliation | Academia | Tan Zhu, Guannan Liang, Chunjiang Zhu, Haining Li, Jinbo Bi Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA tan.zhu@uconn.edu, guannan.liang@uconn.edu, chunjiang.zhu@uconn.edu, haining.li@uconn.edu, jinbo.bi@uconn.edu |
| Pseudocode | Yes | Algorithm 1: SSGD-SCB |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of the described methodology. It mentions 'Vowpal Wabbit' as a baseline system, but not for its own implementation. |
| Open Datasets | Yes | We use the CIFAR-10 dataset (Simonyan and Zisserman 2014), which has been widely used for benchmarking non-convex optimization algorithms. |
| Dataset Splits | No | The paper specifies training and test sets: 'For both CIFAR-10 and CIFAR-10+N data, 50K samples are selected as the training set Dtrain while the rest 10K samples form the test set Dtest.' However, it does not explicitly mention a validation set or its split. |
| Hardware Specification | Yes | We implement all the algorithms in Py Torch and test on a server equipped with Intel Xeon Gold 6150 2.7GHz CPU, 192GB RAM, and an NVIDIA Tesla V100 GPU. |
| Software Dependencies | No | The paper mentions 'We implement all the algorithms in Py Torch' but does not provide a specific version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The reward functions of the above algorithms are modeled by a variant of VGG-11 with batch normalization, which contains 9 weight layers and 9.2 million learnable parameters (see Appendix C for the detailed structure and the hyper-parameter settings). |