Noise-Adaptive Thompson Sampling for Linear Contextual Bandits

Authors: Ruitu Xu, Yifei Min, Tianhao Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our simulation, we test our algorithm on a linear contextual bandit problem with different sets of reward perturbations. Specifically, the simulation is carried out over T = 2000 rounds with an ambient dimension of d = 25. For each selection round t [T], the environment yields a set of contexts Xt, which contains K = 50 context vectors. The context vectors are randomly drawn and structured as truncated Gaussian vectors, ensuring their norms remain bounded by 1. Within each trial, a random ground truth vector θ is generated as another Gaussian vector. The reward associated with every context vector is then computed based on the expected reward, perturbed by Gaussian noise. We employ our proposed algorithms, Lin VDTS and Lin NATS, to scrutinize their performance within this configuration. Figure Fig. 1 presents the cumulative regrets of the Lin VDTS and Lin NATS algorithms, using the vanilla TS as a benchmark.
Researcher Affiliation Academia Ruitu Xu Department of Statistics and Data Science Yale University New Haven, CT 06511 ruitu.xu@yale.edu Yifei Min Department of Statistics and Data Science Yale University New Haven, CT 06511 yifei.min@yale.edu Tianhao Wang Department of Statistics and Data Science Yale University New Haven, CT 06511 tianhao.wang@yale.edu
Pseudocode Yes Algorithm 1 Linear Thompson Sampling; Algorithm 2 Linear Variance-Dependent Thompson Sampling (Lin VDTS); Algorithm 3 Linear Noise-Adaptive Thompson Sampling (Lin NATS)
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for its methodology.
Open Datasets No The paper describes a simulation setup ('In our simulation, we test our algorithm on a linear contextual bandit problem...') where contexts and parameters are randomly generated, rather than using a specific, publicly available dataset with a formal train split or a public access link/citation.
Dataset Splits No The paper describes a simulation setup ('The simulation is carried out over T = 2000 rounds with an ambient dimension of d = 25.') which is a time horizon for a bandit problem, not a train/validation/test split of a static dataset.
Hardware Specification No The paper describes the simulation parameters (e.g., T=2000 rounds, d=25 dimension, K=50 context vectors) and the nature of the generated data, but does not provide any specific details about the hardware (e.g., CPU, GPU models) used to run these simulations.
Software Dependencies No The paper describes the algorithms and simulation parameters, but does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, specific libraries).
Experiment Setup Yes In our simulation, we test our algorithm on a linear contextual bandit problem with different sets of reward perturbations. Specifically, the simulation is carried out over T = 2000 rounds with an ambient dimension of d = 25. For each selection round t [T], the environment yields a set of contexts Xt, which contains K = 50 context vectors. The context vectors are randomly drawn and structured as truncated Gaussian vectors, ensuring their norms remain bounded by 1. Within each trial, a random ground truth vector θ is generated as another Gaussian vector. The reward associated with every context vector is then computed based on the expected reward, perturbed by Gaussian noise. We employ our proposed algorithms, Lin VDTS and Lin NATS, to scrutinize their performance within this configuration.