Nash Regret Guarantees for Linear Bandits

Authors: Ayush Sawarni, Soumyabrata Pal, Siddharth Barman

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental E Experiments We conduct experiments to compare the performance of our algorithm LINNASH with Thompson Sampling on synthetic data.
Researcher Affiliation Collaboration Ayush Sawarni Indian Institute of Science Bangalore sawarniayush@gmail.com Soumyabrata Pal Google Research Bangalore soumyabrata@google.com Siddharth Barman Indian Institute of Science Bangalore barman@iisc.ac.in
Pseudocode Yes Algorithm 1 Generate Arm Sequence
Open Source Code No The paper does not provide explicit statements or links to open-source code for the described methodology.
Open Datasets No We conduct experiments to compare the performance of our algorithm LINNASH with Thompson Sampling on synthetic data.
Dataset Splits No The paper describes using synthetic data but does not provide specific training, validation, or test dataset splits.
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions algorithms used (e.g., LINNASH, Thompson Sampling) but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes We fine-tune the parameters of both algorithms and evaluate their performance in the following experimental setup: We fix the ambient dimension d = 80, the number of arms |X| = 10000, and the number of rounds T = 50000. Both the unknown parameter vector, θ , and the arm embeddings are sampled from a multivariate Gaussian distribution. Subsequently, the arm embeddings are shifted and normalized to ensure that all mean rewards are non-negative, with the maximum reward mean being set to 0.5. Upon pulling an arm, we observe a Bernoulli random variable with a probability corresponding to its mean reward.