Bilinear Bandits with Low-rank Structure
Authors: Kwang-Sung Jun, Rebecca Willett, Stephen Wright, Robert Nowak
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conjecture that the regret bound of ESTR is unimprovable up to polylogarithmic factors, and our preliminary experiment shows that ESTR outperforms a na ıve linear bandit reduction. |
| Researcher Affiliation | Academia | 1Boston University 2University of Chicago 3University of Wisconsin-Madison. |
| Pseudocode | Yes | Algorithm 1 Low OFUL |
| Open Source Code | No | The paper mentions using a C implementation wrapped in Python for Opt Space (https://github.com/strin/pyOptSpace), which is a third-party tool, but does not provide a link or statement for their own implementation of ESTR or Low OFUL. |
| Open Datasets | No | The paper describes a simulation setup for its experiments: 'We draw 16 arms from the unit sphere for each arm set X and Z and simulate the bandit game for T = 104 iterations'. This is a synthetic data generation process, not a publicly available dataset. |
| Dataset Splits | No | The paper describes a simulation with synthetic data, not a pre-existing dataset with formal train/validation/test splits. It mentions 'tuning T1 by grid search' which implies some form of validation, but no explicit dataset splits are provided. |
| Hardware Specification | No | The paper does not specify any particular hardware used for running the experiments (e.g., CPU, GPU models, or memory specifications). |
| Software Dependencies | No | The paper mentions using 'the authors C implementation that is wrapped in python' for Opt Space and refers to the 'Burer-Monteiro formulation', but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We run our simulation with d1 = d2 = 8, r = 1, σ = 0.01. We set λ = 1 for both OFUL and Low OFUL. For the sake of comparison, we tune c by grid search and report the result with the smallest average regret. For ESTR, the value of T1 used in the proof involves some unknown constants; to account for this, we tune T1 by grid search. |