Society of Agents: Regret Bounds of Concurrent Thompson Sampling
Authors: Yan Chen, Perry Dong, Qinxun Bai, Maria Dimakopoulou, Wei Xu, Zhengyuan Zhou
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we describe our empirical study of concurrent Thompson Sampling on a Markov Decision Process with a Dirichlet transition function and a normal reward function for both finitehorizon and infinite-horizon cases. In particular, we train the concurrent Thompson Sampling agents on randomly sampled MDPs and evaluate the regrets of these agents at the end of each episode/epoch on a fixed number of environments sampled from the ground truth distribution. The simulation settings are moved to appendix due to page limit. Some of the experiment plots are shown in Figure(1a), (1b), (2a) and (2b). |
| Researcher Affiliation | Collaboration | Yan Chen1 Perry Dong2 Qinxun Bai3 Maria Dimakopoulou4 Wei Xu3 Zhengyuan Zhou5,6 5Arena Technologies 1Duke 3Horizon Robotics 6NYU Stern 4Spotify 2UC Berkeley |
| Pseudocode | Yes | Algorithm 1: Concurrent PSRL; Algorithm 2: Concurrent Infinite-Horizon Posterior Sampling MDP |
| Open Source Code | No | The paper does not provide any explicit statements or links to open-source code for the described methodology. |
| Open Datasets | No | The paper mentions training on "randomly sampled MDPs" and evaluating on "environments sampled from the ground truth distribution" but does not provide concrete access information (e.g., specific dataset names, links, or citations) for a publicly available dataset. |
| Dataset Splits | No | The paper mentions training and evaluation but does not specify exact training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch, or other libraries/solvers with their versions). |
| Experiment Setup | No | The paper states: "The simulation settings are moved to appendix due to page limit." This indicates details exist but are not provided in the main text as required. |