Bilinear Bandits with Low-rank Structure

Authors: Kwang-Sung Jun, Rebecca Willett, Stephen Wright, Robert Nowak

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conjecture that the regret bound of ESTR is unimprovable up to polylogarithmic factors, and our preliminary experiment shows that ESTR outperforms a na ıve linear bandit reduction.
Researcher Affiliation Academia 1Boston University 2University of Chicago 3University of Wisconsin-Madison.
Pseudocode Yes Algorithm 1 Low OFUL
Open Source Code No The paper mentions using a C implementation wrapped in Python for Opt Space (https://github.com/strin/pyOptSpace), which is a third-party tool, but does not provide a link or statement for their own implementation of ESTR or Low OFUL.
Open Datasets No The paper describes a simulation setup for its experiments: 'We draw 16 arms from the unit sphere for each arm set X and Z and simulate the bandit game for T = 104 iterations'. This is a synthetic data generation process, not a publicly available dataset.
Dataset Splits No The paper describes a simulation with synthetic data, not a pre-existing dataset with formal train/validation/test splits. It mentions 'tuning T1 by grid search' which implies some form of validation, but no explicit dataset splits are provided.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments (e.g., CPU, GPU models, or memory specifications).
Software Dependencies No The paper mentions using 'the authors C implementation that is wrapped in python' for Opt Space and refers to the 'Burer-Monteiro formulation', but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes We run our simulation with d1 = d2 = 8, r = 1, σ = 0.01. We set λ = 1 for both OFUL and Low OFUL. For the sake of comparison, we tune c by grid search and report the result with the smallest average regret. For ESTR, the value of T1 used in the proof involves some unknown constants; to account for this, we tune T1 by grid search.