Factored Bandits
Authors: Julian Zimmert, Yevgeny Seldin
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our theoretical results with the first paper because it matches our problem assumptions. In our experiments, we provide a comparison to both the original algorithm and the KL version. The number of arms is set to 16 in both sets. We always fix u u v v 0.2. We vary the absolute value of u v . As expected, rank1Elim KL has an advantage when the Bernoulli random variables are strongly biased towards one side. When the bias is close to 1 2, we clearly see the better constants of TEA. In the evaluation we clearly outperform rank-1 Elimination over different parameter settings and even beat the KL optimized version if the means are not too close to zero or one. This supports that our algorithm does not only provide a more practical anytime version of elimination, but also improves on constant factors in the regret. Figure 2: Comparison of Rank1Elim, Rank1Elim KL, and TEA for K L 16. The results are averaged over 20 repetitions of the experiment. Figure 3: Comparison of Dueling Bandits algorithms with identical gaps of 0.4. The results are averaged over 20 repetitions of the experiment. |
| Researcher Affiliation | Academia | Julian Zimmert University of Copenhagen zimmert@di.ku.dk Yevgeny Seldin University of Copenhagen seldin@di.ku.dk |
| Pseudocode | Yes | Algorithm 1: Factored Bandit TEA Algorithm 2: Dueling Bandit TEA Algorithm 3: Temporary Elimination Module (TEM) Implementation |
| Open Source Code | No | The paper does not provide any link or explicit statement about the release of its source code for the methodology described. |
| Open Datasets | No | The paper describes experimental comparisons with different settings (e.g., 'number of arms is set to 16', 'winning probability...set to 0.7'), and mentions using 'the framework provided by Komiyama et al. [9]', but does not specify any publicly available datasets used for training or provide access information for such datasets. |
| Dataset Splits | No | The paper describes empirical comparisons and experimental results, but it does not specify any training/validation/test dataset splits (e.g., percentage splits, sample counts, or references to predefined standard splits). |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware (e.g., GPU models, CPU models, memory details) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software components, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | The number of arms is set to 16 in both sets. We always fix u u v v 0.2. We vary the absolute value of u v. We have used the framework provided by Komiyama et al. [9]. We use the same utility for all sub-optimal arms. In Figure 3, the winning probability of the optimal arm over suboptimal arms is always set to 0.7, we run the experiment for different number of arms K. To show that there also exists a regime where the improved constants gain an advantage over RMED, we conducted a second experiment in Figure 4 (in the Appendix), where we set the winning probability to 0.952 and significantly increase the number of arms. |