Improved Regret Bounds of Bilinear Bandits using Action Space Analysis

Authors: Kyoungseok Jang, Kwang-Sung Jun, Se-Young Yun, Wanmo Kang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present an experiment to compare the performance of the existing bilinear algorithms and our r O-UCB algorithm. In the experiment, we consider the four methods: ESTR-OS, which is the proposed method of Jun et al. (2019); ESTRBM, the best heuristic method in Jun et al. (2019); OFUL, naive OFUL extension as discussed in (1); and r O-UCB, our proposed algorithm. We use the grid search method to adjust the forced exploration time of ESTR and confidence bound width of all algorithms. Instead of the true oracle, we used one of the low rank approximation of Burer & Monteiro (2003) instead. The graph is about the best regret result for each algorithm. Our r O-UCB outperforms every other algorithms, and one can see several additional experiments in the Appendix E.3 that in another environment with larger σ, our r O-UCB outperforms other algorithms with stability while ESTR algorithms fail because of the unsuccessful forced exploration.
Researcher Affiliation Academia 1Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology, Daejeon, Korea 2Department of Computation, University of Arizona, Arizona, USA 3Graduate School of AI, Korea Advanced Institute of Science and Technology, Daejeon, Korea.
Pseudocode Yes Algorithm 1 ϵ-FALB ... Algorithm 2 Sup Lin UCB ... Algorithm 3 Base Lin UCB ... Algorithm 4 Phase Elimination ... Algorithm 5 r O-UCB
Open Source Code No The paper does not provide a specific link or explicit statement about releasing the source code for the described methodology.
Open Datasets No The paper describes a simulated bandit problem setup for experiments (Figure 1 and Section 5.1). It does not mention using a specific publicly available dataset (e.g., CIFAR-10, ImageNet) with a formal citation or link.
Dataset Splits No The paper conducts simulations of a bandit problem and evaluates performance based on regret (Figure 1). It does not describe traditional dataset splits (e.g., 80/10/10) for training, validation, or test sets as would be typical for machine learning models on fixed datasets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup Yes We use the grid search method to adjust the forced exploration time of ESTR and confidence bound width of all algorithms. ... In the experiment, we consider the four methods... for d = 8 and r = 1, and σ = 0.01.