Regime Switching Bandits
Authors: Xiang Zhou, Yi Xiong, Ningyuan Chen, Xuefeng GAO
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct proof-of-concept experiments to illustrate the performance of the learning algorithm. In Figure 1(a), we plot the average regret versus T of different algorithms in log-log scale, where the number of runs for each algorithm is 500. |
| Researcher Affiliation | Academia | Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong; 1911606962@qq.com Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong; yxiong@se.cuhk.edu.hk The Rotman School of Management, University of Toronto; ningyuan.chen@utoronto.ca Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong; xfgao@se.cuhk.edu.hk |
| Pseudocode | Yes | Algorithm 1 Spectral estimation of (µ, P) from the observations from the exploration phase [2, 3, 8]. Algorithm 2 The SEEU Algorithm |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper describes a simulated experiment setting with specified parameters (P, mu) but does not mention the use of any publicly available or open datasets, nor does it provide access information for any dataset. |
| Dataset Splits | No | The paper describes a simulated experiment setting but does not provide specific details on training, validation, or test dataset splits. |
| Hardware Specification | Yes | The numerical experiments are conducted on a PC with 3.10 GHz Intel Processor and 16 GB of RAM. |
| Software Dependencies | No | The paper mentions general tools but does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | Input: Initial belief b1, precision δ, exploration parameter τ1, exploitation parameter τ2. For the example above, we calculate the average regret for several pairs of parameters (τ1, τ2). It can be seen that the choices of these parameters do not affect the order O(T 2/3) of the regret (the slope). See Figure 1(b) for an illustration. |