Society of Agents: Regret Bounds of Concurrent Thompson Sampling

Authors: Yan Chen, Perry Dong, Qinxun Bai, Maria Dimakopoulou, Wei Xu, Zhengyuan Zhou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we describe our empirical study of concurrent Thompson Sampling on a Markov Decision Process with a Dirichlet transition function and a normal reward function for both finitehorizon and infinite-horizon cases. In particular, we train the concurrent Thompson Sampling agents on randomly sampled MDPs and evaluate the regrets of these agents at the end of each episode/epoch on a fixed number of environments sampled from the ground truth distribution. The simulation settings are moved to appendix due to page limit. Some of the experiment plots are shown in Figure(1a), (1b), (2a) and (2b).
Researcher Affiliation Collaboration Yan Chen1 Perry Dong2 Qinxun Bai3 Maria Dimakopoulou4 Wei Xu3 Zhengyuan Zhou5,6 5Arena Technologies 1Duke 3Horizon Robotics 6NYU Stern 4Spotify 2UC Berkeley
Pseudocode Yes Algorithm 1: Concurrent PSRL; Algorithm 2: Concurrent Infinite-Horizon Posterior Sampling MDP
Open Source Code No The paper does not provide any explicit statements or links to open-source code for the described methodology.
Open Datasets No The paper mentions training on "randomly sampled MDPs" and evaluating on "environments sampled from the ground truth distribution" but does not provide concrete access information (e.g., specific dataset names, links, or citations) for a publicly available dataset.
Dataset Splits No The paper mentions training and evaluation but does not specify exact training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch, or other libraries/solvers with their versions).
Experiment Setup No The paper states: "The simulation settings are moved to appendix due to page limit." This indicates details exist but are not provided in the main text as required.