Best-Arm Identification in Linear Bandits
Authors: Marta Soare, Alessandro Lazaric, Remi Munos
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate the performance of XY-Adaptive and compare it to the XY-Oracle strategy (Eq. 5), the static allocations XY and G, as well as with the fully-adaptive version of XY where X is updated at each round and the bound from Prop.2 is used. For a fixed confidence δ = 0.05, we compare the sampling budget needed to identify the best arm with probability at least 1 δ. We consider a set of arms X Rd, with |X| = d + 1 including the canonical basis (e1, . . . , ed) and an additional arm xd+1 = [cos(ω) sin(ω) 0 . . . 0] . We choose θ = [2 0 0 . . . 0] , and fix ω = 0.01, so that Δmin = (x1 xd+1) θ is much smaller than the other gaps. In this setting, an efficient sampling strategy should focus on reducing the uncertainty in the direction y = (x1 xd+1) by pulling the arm x2 = e2 which is almost aligned with y. In fact, from the rewards obtained from x2 it is easier to decrease the uncertainty about the second component of θ , that is precisely the dimension which allows to discriminate between x1 and xd+1. Also, we fix α = 1/10, and the noise ε N(0, 1). Each phase begins with an initialization matrix A0, obtained by pulling once each canonical arm. In Fig. 4 we report the sampling budget of the algorithms, averaged over 100 runs, for d = 2 . . . 10. |
| Researcher Affiliation | Collaboration | Marta Soare Alessandro Lazaric Rémi Munos INRIA Lille Nord Europe, Seque L Team {marta.soare,alessandro.lazaric,remi.munos}@inria.fr This work was done when the author was a visiting researcher at Microsoft Research New-England. Current affiliation: Google Deep Mind. |
| Pseudocode | Yes | Figure 2: Static allocation algorithms Figure 3: XY-Adaptive allocation algorithm |
| Open Source Code | No | The paper provides a link to a technical report (http://arxiv.org/abs/1409.6110), but it does not contain an explicit statement about the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper describes a synthetic setup for its experiments: 'We consider a set of arms X Rd, with |X| = d + 1 including the canonical basis (e1, . . . , ed) and an additional arm xd+1 = [cos(ω) sin(ω) 0 . . . 0] . We choose θ = [2 0 0 . . . 0] , and fix ω = 0.01, so that Δmin = (x1 xd+1) θ is much smaller than the other gaps.' This is a custom-generated environment rather than a publicly available or open dataset with provided access information. |
| Dataset Splits | No | The paper defines a synthetic experimental setup but does not specify train, validation, or test dataset splits (e.g., percentages or sample counts), nor does it reference predefined splits from standard benchmarks. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper does not mention any specific software dependencies or their version numbers required to reproduce the experiments. |
| Experiment Setup | Yes | For a fixed confidence δ = 0.05, we compare the sampling budget needed to identify the best arm with probability at least 1 δ. ... Also, we fix α = 1/10, and the noise ε N(0, 1). Each phase begins with an initialization matrix A0, obtained by pulling once each canonical arm. |