Langevin Monte Carlo for Contextual Bandits
Authors: Pan Xu, Hongkai Zheng, Eric V Mazumdar, Kamyar Azizzadenesheli, Animashree Anandkumar
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on both synthetic data and real-world datasets on different contextual bandit models, which demonstrates that directly sampling from the posterior is both computationally efficient and competitive in performance. |
| Researcher Affiliation | Academia | 1Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, USA 2Department of Computer Science, Purdue University, West Lafayette, IN, USA. |
| Pseudocode | Yes | Algorithm 1 Langevin Monte Carlo Thompson Sampling (LMC-TS) |
| Open Source Code | Yes | Our implementation can be found at https://github.com/devzhk/LMCTS. |
| Open Datasets | Yes | We conduct experiments on both synthetic datasets and real-world datasets (UCI machine learning datasets and a high dimensional image dataset CIFAR10)... |
| Dataset Splits | No | The paper mentions using synthetic datasets, UCI machine learning datasets, and CIFAR10. It describes how context vectors are constructed for classification datasets but does not provide specific train/validation/test split percentages or counts, nor does it refer to predefined splits with citations for reproducibility. |
| Hardware Specification | Yes | All experiments are conducted on Amazon EC2 P3 instances with NVIDIA V100 GPUs and Broadwell E5-2686 v4 processors. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python version, specific library versions like PyTorch or TensorFlow). |
| Experiment Setup | Yes | For LMC-TS, we set the step size ηt = η0/t as suggested in our theory and do a grid search for the constant η0 and the temperature parameter β-1. We fix the epoch length for the inner loop of our algorithm as Kt = 100 for all t... Neural networks are all updated by 100 gradient descent steps every round. |