reproducibilityindex.ai

Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning

Authors: Amin Karbasi, Nikki Lijing Kuang, Yian Ma, Siddharth Mitra

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We complement our theoretical findings with experimental results. 7. Experiments In this section, we perform empirical studies in simulated environments for bandit and RL to corroborate our theoretical findings. By comparing the actual regret (average rewards) and the number of batches for interaction (maximum policy switches), we show Langevin TS algorithms empowered by LMC methods achieve appealing statistical accuracy with low communication cost.
Researcher Affiliation	Academia	1Department of Electrical Engineering, Yale University, New Haven, USA 2Department of Computer Science, Yale University, New Haven, USA 3Department of Statistics and Data Science, Yale University, New Haven, USA 4Department of Computer Science and Engineering, University of California San Diego, La Jolla, USA 5Halıcıoˇglu Data Science Institute, University of California San Diego, La Jolla, USA.
Pseudocode	Yes	Algorithm 1 SGLD with Batched Data ... Algorithm 2 Batched Langevin Thompson Sampling (BLTS) ... Algorithm 3 Langevin PSRL (LPSRL) ... Algorithm 4 Mirrored Langevin Dynamics (MLD)
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We simulate a Gaussian bandit environment with N = 15 arms. ... To demonstrate the applicability of Langevin TS in scenarios where posteriors are intractable, we construct a Laplace bandit environment with N = 10 arms. ... In MDP setting, we consider a variant of River Swim environment (Strehl and Littman, 2008), which is a common testbed for provable RL methods.
Dataset Splits	No	The paper conducts experiments on simulated environments (Gaussian bandits, Laplace bandits, River Swim) rather than using pre-defined datasets with explicit training/validation/test splits. Performance is evaluated over a time horizon T.
Hardware Specification	No	The paper mentions running 'empirical studies in simulated environments' but does not provide any specific details about the hardware used for these simulations (e.g., CPU, GPU models, memory, or cloud instances).
Software Dependencies	No	The paper mentions using specific algorithms like SGLD and MLD and notes that additional experimental details are in Appendix F. However, Appendix F describes environment parameters and prior settings but does not list specific versions of software libraries (e.g., PyTorch, TensorFlow) or programming languages used for implementation.
Experiment Setup	No	The paper provides theoretical specifications for algorithm parameters (e.g., 'with the SGLD parameters specified as per Algorithm 1 and with γ = O(1/dκ3)' or 'setting the hyperparameters as per Theorem 1'). It also details environment parameters like the number of arms, reward distributions, and prior settings (e.g., 'N = 15 arms', 'expected rewards ... evenly spaced in [1, 20]', 'Gaussian priors ... means evenly spaced in [14, 20], and inverted variance ... set to 0.375'). However, it does not provide specific numerical values for the learning algorithm's operational hyperparameters (e.g., a precise learning rate value for SGLD or number of iterations beyond O() notation) that would typically be part of an experimental setup for reproducibility.