reproducibilityindex.ai

Noise-Adaptive Thompson Sampling for Linear Contextual Bandits

Authors: Ruitu Xu, Yifei Min, Tianhao Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our simulation, we test our algorithm on a linear contextual bandit problem with different sets of reward perturbations. Specifically, the simulation is carried out over T = 2000 rounds with an ambient dimension of d = 25. For each selection round t [T], the environment yields a set of contexts Xt, which contains K = 50 context vectors. The context vectors are randomly drawn and structured as truncated Gaussian vectors, ensuring their norms remain bounded by 1. Within each trial, a random ground truth vector θ is generated as another Gaussian vector. The reward associated with every context vector is then computed based on the expected reward, perturbed by Gaussian noise. We employ our proposed algorithms, Lin VDTS and Lin NATS, to scrutinize their performance within this configuration. Figure Fig. 1 presents the cumulative regrets of the Lin VDTS and Lin NATS algorithms, using the vanilla TS as a benchmark.
Researcher Affiliation	Academia	Ruitu Xu Department of Statistics and Data Science Yale University New Haven, CT 06511 ruitu.xu@yale.edu Yifei Min Department of Statistics and Data Science Yale University New Haven, CT 06511 yifei.min@yale.edu Tianhao Wang Department of Statistics and Data Science Yale University New Haven, CT 06511 tianhao.wang@yale.edu
Pseudocode	Yes	Algorithm 1 Linear Thompson Sampling; Algorithm 2 Linear Variance-Dependent Thompson Sampling (Lin VDTS); Algorithm 3 Linear Noise-Adaptive Thompson Sampling (Lin NATS)
Open Source Code	No	The paper does not provide any explicit statement about releasing source code or a link to a code repository for its methodology.
Open Datasets	No	The paper describes a simulation setup ('In our simulation, we test our algorithm on a linear contextual bandit problem...') where contexts and parameters are randomly generated, rather than using a specific, publicly available dataset with a formal train split or a public access link/citation.
Dataset Splits	No	The paper describes a simulation setup ('The simulation is carried out over T = 2000 rounds with an ambient dimension of d = 25.') which is a time horizon for a bandit problem, not a train/validation/test split of a static dataset.
Hardware Specification	No	The paper describes the simulation parameters (e.g., T=2000 rounds, d=25 dimension, K=50 context vectors) and the nature of the generated data, but does not provide any specific details about the hardware (e.g., CPU, GPU models) used to run these simulations.
Software Dependencies	No	The paper describes the algorithms and simulation parameters, but does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, specific libraries).
Experiment Setup	Yes	In our simulation, we test our algorithm on a linear contextual bandit problem with different sets of reward perturbations. Specifically, the simulation is carried out over T = 2000 rounds with an ambient dimension of d = 25. For each selection round t [T], the environment yields a set of contexts Xt, which contains K = 50 context vectors. The context vectors are randomly drawn and structured as truncated Gaussian vectors, ensuring their norms remain bounded by 1. Within each trial, a random ground truth vector θ is generated as another Gaussian vector. The reward associated with every context vector is then computed based on the expected reward, perturbed by Gaussian noise. We employ our proposed algorithms, Lin VDTS and Lin NATS, to scrutinize their performance within this configuration.