An Information-Theoretic Analysis of Nonstationary Bandit Learning
Authors: Seungki Min, Daniel Russo
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | See Figure 2, where we report the effect of τ cm and τ id on the performance of several bandit algorithms... Remarkably, their performances appear to be sensitive only to τ id but not to τ cm... Appendix B. Numerical Experiment in Detail. We here illustrate the detailed procedure of the numerical experiment conducted in Section 1.1. ... In all experiments, we use T = 1000 and S = 1000. |
| Researcher Affiliation | Academia | 1Department of Industrial and Systems Engineering, KAIST 2Division of Decision Risk and Operations, Columbia Business School. |
| Pseudocode | No | Thompson sampling (TS) is assumed to know the dynamics of latent state processes as well as the exact values of τ cm and τ id (i.e., no model/prior misspecification). More specifically, in each period t, πTS draws a random sample... Sliding-Window TS is a simple modification of stationary Thompson sampling such that behaves as if the environment is stationary but discards all observations revealed before L periods ago. |
| Open Source Code | No | The paper does not provide any statement or link regarding the release of open-source code for the methodology described. |
| Open Datasets | No | Generative model. We here illustrate the detailed procedure of the numerical experiment conducted in Section 1.1. ... As described in Section 1.1, we consider a nonstionary two-arm Gaussian bandit with unit noise variance: Rt,a = θcm t + θid t,a +ϵt,a, a {1, 2}, t N, where ϵt,a s are i.i.d. noises N(0, 1^2), θcm t t N is the common variation process GP(1^2, τ cm), and θid t,a t N is arm a s idiosyncratic variation process GP(1^2, τ id). |
| Dataset Splits | No | Given an environment specified by (τ cm, τ id), we estimate the per-period regret of an algorithm π using S sample paths: ˆ T (π; τ cm, τ id) := 1/S P_s=1^S T (Aπ,(s); θ(s)), where θ(s) is the sth sample path (that is shared by all algorithms) and Aπ,(s) is the action sequence taken by π along this sample path. In all experiments, we use T = 1000 and S = 1000. |
| Hardware Specification | No | The paper describes simulation results but does not provide any details about the specific hardware used to run these simulations or experiments. |
| Software Dependencies | No | The paper describes the generative model and the algorithms used (Thompson sampling, Sliding-Window TS, Sliding-Window UCB) but does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | In all experiments, we use T = 1000 and S = 1000. ... Sliding-Window TS ... with window length L draws a random sample... (L {10, 50, 100})... Sliding-Window UCB... beta is a control parameter determining the degree of exploration. (β = 2.0 used in Figure 6). |