Forced Exploration in Bandit Problems
Authors: Qi Han, Li Zhu, Fei Guo
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Furthermore, we compare our algorithm with popular bandit algorithms on different reward distributions. Experiments Stationary Settings In this section, we compare our method with other nonparametric bandit algorithms on Gaussian and Bernoulli distribution rewards. |
| Researcher Affiliation | Academia | School of Software Engineering, Xi an Jiaotong University, 28 Xianning West Road, Xi an, Shaanxi, 710049, China {qihan19,co.fly}@stu.xjtu.edu.cn, zhuli@xjtu.edu.cn |
| Pseudocode | Yes | Algorithm 1 shows the pseudocode of our method. Algorithm 1: FE Input: non-decreasing sequence {f(r)}, K arms, horizon T Initialization: t = 1, r = 0, f(0) = 0, i {1, ..., K}, p(i) = 0, flag(i) = 0 1: while t < T do |
| Open Source Code | Yes | 1https://github.com/qh1874/Force Explor |
| Open Datasets | No | The paper states that data is generated from Gaussian and Bernoulli distributions ('The means and variances of Gaussian distributions are randomly generated from uniform distribution: µ(i) U(0, 1), σ(i) U(0, 1). The means of Bernoulli distribution are also generated from U(0, 1).'), but it does not use or provide access information for a pre-existing publicly available dataset. |
| Dataset Splits | No | The paper describes averaging results over multiple independent runs and calculating confidence intervals, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) as is common in supervised learning contexts, because it generates rewards dynamically rather than using a static dataset. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., library names like PyTorch or TensorFlow with their respective versions) that were used in the experiments. |
| Experiment Setup | Yes | The time horizon is set as T = 100000. We fix the number of arms as K = 10. Following Remark 2, we set τ = p T/BT log(T) for SW-FE-Exp, τ = p T log(T)/BT for SW-FE-Constant and SW-FE-Linear. |