An Information-Theoretic Analysis of Nonstationary Bandit Learning

Authors: Seungki Min, Daniel Russo

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental See Figure 2, where we report the effect of τ cm and τ id on the performance of several bandit algorithms... Remarkably, their performances appear to be sensitive only to τ id but not to τ cm... Appendix B. Numerical Experiment in Detail. We here illustrate the detailed procedure of the numerical experiment conducted in Section 1.1. ... In all experiments, we use T = 1000 and S = 1000.
Researcher Affiliation Academia 1Department of Industrial and Systems Engineering, KAIST 2Division of Decision Risk and Operations, Columbia Business School.
Pseudocode No Thompson sampling (TS) is assumed to know the dynamics of latent state processes as well as the exact values of τ cm and τ id (i.e., no model/prior misspecification). More specifically, in each period t, πTS draws a random sample... Sliding-Window TS is a simple modification of stationary Thompson sampling such that behaves as if the environment is stationary but discards all observations revealed before L periods ago.
Open Source Code No The paper does not provide any statement or link regarding the release of open-source code for the methodology described.
Open Datasets No Generative model. We here illustrate the detailed procedure of the numerical experiment conducted in Section 1.1. ... As described in Section 1.1, we consider a nonstionary two-arm Gaussian bandit with unit noise variance: Rt,a = θcm t + θid t,a +ϵt,a, a {1, 2}, t N, where ϵt,a s are i.i.d. noises N(0, 1^2), θcm t t N is the common variation process GP(1^2, τ cm), and θid t,a t N is arm a s idiosyncratic variation process GP(1^2, τ id).
Dataset Splits No Given an environment specified by (τ cm, τ id), we estimate the per-period regret of an algorithm π using S sample paths: ˆ T (π; τ cm, τ id) := 1/S P_s=1^S T (Aπ,(s); θ(s)), where θ(s) is the sth sample path (that is shared by all algorithms) and Aπ,(s) is the action sequence taken by π along this sample path. In all experiments, we use T = 1000 and S = 1000.
Hardware Specification No The paper describes simulation results but does not provide any details about the specific hardware used to run these simulations or experiments.
Software Dependencies No The paper describes the generative model and the algorithms used (Thompson sampling, Sliding-Window TS, Sliding-Window UCB) but does not list specific software dependencies with version numbers.
Experiment Setup Yes In all experiments, we use T = 1000 and S = 1000. ... Sliding-Window TS ... with window length L draws a random sample... (L {10, 50, 100})... Sliding-Window UCB... beta is a control parameter determining the degree of exploration. (β = 2.0 used in Figure 6).