reproducibilityindex.ai

Forced Exploration in Bandit Problems

Authors: Qi Han, Li Zhu, Fei Guo

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Furthermore, we compare our algorithm with popular bandit algorithms on different reward distributions. Experiments Stationary Settings In this section, we compare our method with other nonparametric bandit algorithms on Gaussian and Bernoulli distribution rewards.
Researcher Affiliation	Academia	School of Software Engineering, Xi an Jiaotong University, 28 Xianning West Road, Xi an, Shaanxi, 710049, China {qihan19,co.fly}@stu.xjtu.edu.cn, zhuli@xjtu.edu.cn
Pseudocode	Yes	Algorithm 1 shows the pseudocode of our method. Algorithm 1: FE Input: non-decreasing sequence {f(r)}, K arms, horizon T Initialization: t = 1, r = 0, f(0) = 0, i {1, ..., K}, p(i) = 0, flag(i) = 0 1: while t < T do
Open Source Code	Yes	1https://github.com/qh1874/Force Explor
Open Datasets	No	The paper states that data is generated from Gaussian and Bernoulli distributions ('The means and variances of Gaussian distributions are randomly generated from uniform distribution: µ(i) U(0, 1), σ(i) U(0, 1). The means of Bernoulli distribution are also generated from U(0, 1).'), but it does not use or provide access information for a pre-existing publicly available dataset.
Dataset Splits	No	The paper describes averaging results over multiple independent runs and calculating confidence intervals, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) as is common in supervised learning contexts, because it generates rewards dynamically rather than using a static dataset.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., library names like PyTorch or TensorFlow with their respective versions) that were used in the experiments.
Experiment Setup	Yes	The time horizon is set as T = 100000. We fix the number of arms as K = 10. Following Remark 2, we set τ = p T/BT log(T) for SW-FE-Exp, τ = p T log(T)/BT for SW-FE-Constant and SW-FE-Linear.