Optimal Best-arm Identification in Linear Bandits
Authors: Yassir Jedra, Alexandre Proutiere
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiments We present here a few experimental results comparing the performance of our algorithm to that of RAGE, the state-of-the-art algorithm [16], in the case of finite set of arms. We compare LT&S and RAGE only because they outperform other existing algorithms. Further experimental results can be found in Appendix A. |
| Researcher Affiliation | Academia | Yassir Jedra KTH Stockholm, Sweden jedra@kth.se Alexandre Proutiere KTH Stockholm, Sweden alepro@kth.se |
| Pseudocode | Yes | Algorithm 1: Lazy Track-and-Stop (LT&S) |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-source code for the described methodology. |
| Open Datasets | No | We use the following toy experiment which corresponds to the many arms example in [16]. d = 2 and A = {(1, 0), ej3π/4, ej(π/4+φi), i [n 2]} C where (φi) are i.i.d. N(0, 0.09). µ = (1, 0). The paper describes the setup but does not provide concrete access information (link, DOI, or specific citation for a public dataset with authors/year) for the data used in the experiments. |
| Dataset Splits | No | The paper does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (like GPU/CPU models or types of machines) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Frank-Wolfe algorithm' but does not provide specific version numbers for any software, libraries, or programming languages used in the implementation. |
| Experiment Setup | Yes | Experimental set-up. We use the following toy experiment which corresponds to the many arms example in [16]. d = 2 and A = {(1, 0), ej3π/4, ej(π/4+φi), i [n 2]} C where (φi) are i.i.d. N(0, 0.09). µ = (1, 0). Experiments are made with the risk δ = 0.05. ... Implementation of LT&S. To update the allocation w(t), we use Frank-Wolfe algorithm [21]... We implement the exponential lazy update scheme T = {2k, k N}. The parameters of our stopping rule are c = c A0 d (so that after d steps the second condition of the stopping rule is satisfied) and u = 1; we use the threshold β(6δ/π2t2, t). |