Noise-Adaptive Thompson Sampling for Linear Contextual Bandits
Authors: Ruitu Xu, Yifei Min, Tianhao Wang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our simulation, we test our algorithm on a linear contextual bandit problem with different sets of reward perturbations. Specifically, the simulation is carried out over T = 2000 rounds with an ambient dimension of d = 25. For each selection round t [T], the environment yields a set of contexts Xt, which contains K = 50 context vectors. The context vectors are randomly drawn and structured as truncated Gaussian vectors, ensuring their norms remain bounded by 1. Within each trial, a random ground truth vector θ is generated as another Gaussian vector. The reward associated with every context vector is then computed based on the expected reward, perturbed by Gaussian noise. We employ our proposed algorithms, Lin VDTS and Lin NATS, to scrutinize their performance within this configuration. Figure Fig. 1 presents the cumulative regrets of the Lin VDTS and Lin NATS algorithms, using the vanilla TS as a benchmark. |
| Researcher Affiliation | Academia | Ruitu Xu Department of Statistics and Data Science Yale University New Haven, CT 06511 ruitu.xu@yale.edu Yifei Min Department of Statistics and Data Science Yale University New Haven, CT 06511 yifei.min@yale.edu Tianhao Wang Department of Statistics and Data Science Yale University New Haven, CT 06511 tianhao.wang@yale.edu |
| Pseudocode | Yes | Algorithm 1 Linear Thompson Sampling; Algorithm 2 Linear Variance-Dependent Thompson Sampling (Lin VDTS); Algorithm 3 Linear Noise-Adaptive Thompson Sampling (Lin NATS) |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository for its methodology. |
| Open Datasets | No | The paper describes a simulation setup ('In our simulation, we test our algorithm on a linear contextual bandit problem...') where contexts and parameters are randomly generated, rather than using a specific, publicly available dataset with a formal train split or a public access link/citation. |
| Dataset Splits | No | The paper describes a simulation setup ('The simulation is carried out over T = 2000 rounds with an ambient dimension of d = 25.') which is a time horizon for a bandit problem, not a train/validation/test split of a static dataset. |
| Hardware Specification | No | The paper describes the simulation parameters (e.g., T=2000 rounds, d=25 dimension, K=50 context vectors) and the nature of the generated data, but does not provide any specific details about the hardware (e.g., CPU, GPU models) used to run these simulations. |
| Software Dependencies | No | The paper describes the algorithms and simulation parameters, but does not list any specific software dependencies with version numbers (e.g., Python, PyTorch, specific libraries). |
| Experiment Setup | Yes | In our simulation, we test our algorithm on a linear contextual bandit problem with different sets of reward perturbations. Specifically, the simulation is carried out over T = 2000 rounds with an ambient dimension of d = 25. For each selection round t [T], the environment yields a set of contexts Xt, which contains K = 50 context vectors. The context vectors are randomly drawn and structured as truncated Gaussian vectors, ensuring their norms remain bounded by 1. Within each trial, a random ground truth vector θ is generated as another Gaussian vector. The reward associated with every context vector is then computed based on the expected reward, perturbed by Gaussian noise. We employ our proposed algorithms, Lin VDTS and Lin NATS, to scrutinize their performance within this configuration. |