reproducibilityindex.ai

Improved Regret Bounds of Bilinear Bandits using Action Space Analysis

Authors: Kyoungseok Jang, Kwang-Sung Jun, Se-Young Yun, Wanmo Kang

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present an experiment to compare the performance of the existing bilinear algorithms and our r O-UCB algorithm. In the experiment, we consider the four methods: ESTR-OS, which is the proposed method of Jun et al. (2019); ESTRBM, the best heuristic method in Jun et al. (2019); OFUL, naive OFUL extension as discussed in (1); and r O-UCB, our proposed algorithm. We use the grid search method to adjust the forced exploration time of ESTR and conﬁdence bound width of all algorithms. Instead of the true oracle, we used one of the low rank approximation of Burer & Monteiro (2003) instead. The graph is about the best regret result for each algorithm. Our r O-UCB outperforms every other algorithms, and one can see several additional experiments in the Appendix E.3 that in another environment with larger σ, our r O-UCB outperforms other algorithms with stability while ESTR algorithms fail because of the unsuccessful forced exploration.
Researcher Affiliation	Academia	1Department of Mathematical Sciences, Korea Advanced Institute of Science and Technology, Daejeon, Korea 2Department of Computation, University of Arizona, Arizona, USA 3Graduate School of AI, Korea Advanced Institute of Science and Technology, Daejeon, Korea.
Pseudocode	Yes	Algorithm 1 ϵ-FALB ... Algorithm 2 Sup Lin UCB ... Algorithm 3 Base Lin UCB ... Algorithm 4 Phase Elimination ... Algorithm 5 r O-UCB
Open Source Code	No	The paper does not provide a specific link or explicit statement about releasing the source code for the described methodology.
Open Datasets	No	The paper describes a simulated bandit problem setup for experiments (Figure 1 and Section 5.1). It does not mention using a specific publicly available dataset (e.g., CIFAR-10, ImageNet) with a formal citation or link.
Dataset Splits	No	The paper conducts simulations of a bandit problem and evaluates performance based on regret (Figure 1). It does not describe traditional dataset splits (e.g., 80/10/10) for training, validation, or test sets as would be typical for machine learning models on fixed datasets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	We use the grid search method to adjust the forced exploration time of ESTR and conﬁdence bound width of all algorithms. ... In the experiment, we consider the four methods... for d = 8 and r = 1, and σ = 0.01.