Forced Exploration in Bandit Problems

Authors: Qi Han, Li Zhu, Fei Guo

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Furthermore, we compare our algorithm with popular bandit algorithms on different reward distributions. Experiments Stationary Settings In this section, we compare our method with other nonparametric bandit algorithms on Gaussian and Bernoulli distribution rewards.
Researcher Affiliation Academia School of Software Engineering, Xi an Jiaotong University, 28 Xianning West Road, Xi an, Shaanxi, 710049, China {qihan19,co.fly}@stu.xjtu.edu.cn, zhuli@xjtu.edu.cn
Pseudocode Yes Algorithm 1 shows the pseudocode of our method. Algorithm 1: FE Input: non-decreasing sequence {f(r)}, K arms, horizon T Initialization: t = 1, r = 0, f(0) = 0, i {1, ..., K}, p(i) = 0, flag(i) = 0 1: while t < T do
Open Source Code Yes 1https://github.com/qh1874/Force Explor
Open Datasets No The paper states that data is generated from Gaussian and Bernoulli distributions ('The means and variances of Gaussian distributions are randomly generated from uniform distribution: µ(i) U(0, 1), σ(i) U(0, 1). The means of Bernoulli distribution are also generated from U(0, 1).'), but it does not use or provide access information for a pre-existing publicly available dataset.
Dataset Splits No The paper describes averaging results over multiple independent runs and calculating confidence intervals, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) as is common in supervised learning contexts, because it generates rewards dynamically rather than using a static dataset.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., library names like PyTorch or TensorFlow with their respective versions) that were used in the experiments.
Experiment Setup Yes The time horizon is set as T = 100000. We fix the number of arms as K = 10. Following Remark 2, we set τ = p T/BT log(T) for SW-FE-Exp, τ = p T log(T)/BT for SW-FE-Constant and SW-FE-Linear.