Variational Bayesian Optimistic Sampling
Authors: Brendan O'Donoghue, Tor Lattimore
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Figure 2 we show how five agents perform on a 50 50 randomly generated game in self-play and against a best-response opponent. |
| Researcher Affiliation | Industry | Brendan O Donoghue Deep Mind bodonoghue@deepmind.com Tor Lattimore Deep Mind lattimore@deepmind.com |
| Pseudocode | Yes | Algorithm 1 TS for bandits |
| Open Source Code | No | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] We included a description of the data generation process for the simulations we ran. |
| Open Datasets | No | The entries of R were sampled from prior N(0, 1) and the noise term ηt at each time-period was also sampled from N(0, 1). |
| Dataset Splits | No | The paper describes data generation processes (e.g., 'entries of R were sampled from prior N(0, 1)') but does not specify explicit train/validation/test dataset splits for reproducibility. |
| Hardware Specification | No | 3. If you ran experiments... (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] In the appendix. (Specific details are not present in this excerpt.) |
| Software Dependencies | No | The paper does not explicitly provide specific software dependencies with version numbers needed to replicate the experiment. |
| Experiment Setup | Yes | The entries of R were sampled from prior N(0, 1) and the noise term ηt at each time-period was also sampled from N(0, 1). ... For this experiment we set C = 10... We compare the K-learning and UCB algorithms...over 8 seeds. |