reproducibilityindex.ai

An Information-Theoretic Analysis of Nonstationary Bandit Learning

Authors: Seungki Min, Daniel Russo

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	See Figure 2, where we report the effect of τ cm and τ id on the performance of several bandit algorithms... Remarkably, their performances appear to be sensitive only to τ id but not to τ cm... Appendix B. Numerical Experiment in Detail. We here illustrate the detailed procedure of the numerical experiment conducted in Section 1.1. ... In all experiments, we use T = 1000 and S = 1000.
Researcher Affiliation	Academia	1Department of Industrial and Systems Engineering, KAIST 2Division of Decision Risk and Operations, Columbia Business School.
Pseudocode	No	Thompson sampling (TS) is assumed to know the dynamics of latent state processes as well as the exact values of τ cm and τ id (i.e., no model/prior misspeciﬁcation). More speciﬁcally, in each period t, πTS draws a random sample... Sliding-Window TS is a simple modiﬁcation of stationary Thompson sampling such that behaves as if the environment is stationary but discards all observations revealed before L periods ago.
Open Source Code	No	The paper does not provide any statement or link regarding the release of open-source code for the methodology described.
Open Datasets	No	Generative model. We here illustrate the detailed procedure of the numerical experiment conducted in Section 1.1. ... As described in Section 1.1, we consider a nonstionary two-arm Gaussian bandit with unit noise variance: Rt,a = θcm t + θid t,a +ϵt,a, a {1, 2}, t N, where ϵt,a s are i.i.d. noises N(0, 1^2), θcm t t N is the common variation process GP(1^2, τ cm), and θid t,a t N is arm a s idiosyncratic variation process GP(1^2, τ id).
Dataset Splits	No	Given an environment speciﬁed by (τ cm, τ id), we estimate the per-period regret of an algorithm π using S sample paths: ˆ T (π; τ cm, τ id) := 1/S P_s=1^S T (Aπ,(s); θ(s)), where θ(s) is the sth sample path (that is shared by all algorithms) and Aπ,(s) is the action sequence taken by π along this sample path. In all experiments, we use T = 1000 and S = 1000.
Hardware Specification	No	The paper describes simulation results but does not provide any details about the specific hardware used to run these simulations or experiments.
Software Dependencies	No	The paper describes the generative model and the algorithms used (Thompson sampling, Sliding-Window TS, Sliding-Window UCB) but does not list specific software dependencies with version numbers.
Experiment Setup	Yes	In all experiments, we use T = 1000 and S = 1000. ... Sliding-Window TS ... with window length L draws a random sample... (L {10, 50, 100})... Sliding-Window UCB... beta is a control parameter determining the degree of exploration. (β = 2.0 used in Figure 6).