Optimal Best-arm Identification in Linear Bandits

Authors: Yassir Jedra, Alexandre Proutiere

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiments We present here a few experimental results comparing the performance of our algorithm to that of RAGE, the state-of-the-art algorithm [16], in the case of finite set of arms. We compare LT&S and RAGE only because they outperform other existing algorithms. Further experimental results can be found in Appendix A.
Researcher Affiliation Academia Yassir Jedra KTH Stockholm, Sweden jedra@kth.se Alexandre Proutiere KTH Stockholm, Sweden alepro@kth.se
Pseudocode Yes Algorithm 1: Lazy Track-and-Stop (LT&S)
Open Source Code No The paper does not provide any explicit statement or link for open-source code for the described methodology.
Open Datasets No We use the following toy experiment which corresponds to the many arms example in [16]. d = 2 and A = {(1, 0), ej3π/4, ej(π/4+φi), i [n 2]} C where (φi) are i.i.d. N(0, 0.09). µ = (1, 0). The paper describes the setup but does not provide concrete access information (link, DOI, or specific citation for a public dataset with authors/year) for the data used in the experiments.
Dataset Splits No The paper does not specify any training, validation, or test dataset splits.
Hardware Specification No The paper does not provide specific hardware details (like GPU/CPU models or types of machines) used for running its experiments.
Software Dependencies No The paper mentions using 'Frank-Wolfe algorithm' but does not provide specific version numbers for any software, libraries, or programming languages used in the implementation.
Experiment Setup Yes Experimental set-up. We use the following toy experiment which corresponds to the many arms example in [16]. d = 2 and A = {(1, 0), ej3π/4, ej(π/4+φi), i [n 2]} C where (φi) are i.i.d. N(0, 0.09). µ = (1, 0). Experiments are made with the risk δ = 0.05. ... Implementation of LT&S. To update the allocation w(t), we use Frank-Wolfe algorithm [21]... We implement the exponential lazy update scheme T = {2k, k N}. The parameters of our stopping rule are c = c A0 d (so that after d steps the second condition of the stopping rule is satisfied) and u = 1; we use the threshold β(6δ/π2t2, t).