Regime Switching Bandits

Authors: Xiang Zhou, Yi Xiong, Ningyuan Chen, Xuefeng GAO

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct proof-of-concept experiments to illustrate the performance of the learning algorithm. In Figure 1(a), we plot the average regret versus T of different algorithms in log-log scale, where the number of runs for each algorithm is 500.
Researcher Affiliation Academia Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong; 1911606962@qq.com Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong; yxiong@se.cuhk.edu.hk The Rotman School of Management, University of Toronto; ningyuan.chen@utoronto.ca Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong; xfgao@se.cuhk.edu.hk
Pseudocode Yes Algorithm 1 Spectral estimation of (µ, P) from the observations from the exploration phase [2, 3, 8]. Algorithm 2 The SEEU Algorithm
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets No The paper describes a simulated experiment setting with specified parameters (P, mu) but does not mention the use of any publicly available or open datasets, nor does it provide access information for any dataset.
Dataset Splits No The paper describes a simulated experiment setting but does not provide specific details on training, validation, or test dataset splits.
Hardware Specification Yes The numerical experiments are conducted on a PC with 3.10 GHz Intel Processor and 16 GB of RAM.
Software Dependencies No The paper mentions general tools but does not specify any software dependencies with version numbers.
Experiment Setup Yes Input: Initial belief b1, precision δ, exploration parameter τ1, exploitation parameter τ2. For the example above, we calculate the average regret for several pairs of parameters (τ1, τ2). It can be seen that the choices of these parameters do not affect the order O(T 2/3) of the regret (the slope). See Figure 1(b) for an illustration.