Nash Regret Guarantees for Linear Bandits
Authors: Ayush Sawarni, Soumyabrata Pal, Siddharth Barman
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | E Experiments We conduct experiments to compare the performance of our algorithm LINNASH with Thompson Sampling on synthetic data. |
| Researcher Affiliation | Collaboration | Ayush Sawarni Indian Institute of Science Bangalore sawarniayush@gmail.com Soumyabrata Pal Google Research Bangalore soumyabrata@google.com Siddharth Barman Indian Institute of Science Bangalore barman@iisc.ac.in |
| Pseudocode | Yes | Algorithm 1 Generate Arm Sequence |
| Open Source Code | No | The paper does not provide explicit statements or links to open-source code for the described methodology. |
| Open Datasets | No | We conduct experiments to compare the performance of our algorithm LINNASH with Thompson Sampling on synthetic data. |
| Dataset Splits | No | The paper describes using synthetic data but does not provide specific training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions algorithms used (e.g., LINNASH, Thompson Sampling) but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We fine-tune the parameters of both algorithms and evaluate their performance in the following experimental setup: We fix the ambient dimension d = 80, the number of arms |X| = 10000, and the number of rounds T = 50000. Both the unknown parameter vector, θ , and the arm embeddings are sampled from a multivariate Gaussian distribution. Subsequently, the arm embeddings are shifted and normalized to ensure that all mean rewards are non-negative, with the maximum reward mean being set to 0.5. Upon pulling an arm, we observe a Bernoulli random variable with a probability corresponding to its mean reward. |