reproducibilityindex.ai

Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits

Authors: Chen Wang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further tested the empirical performances of our algorithms on simulated MABs instances, where the proposed algorithms outperform the benchmark uniform exploration algorithm by a large margin and, on occasion, reduce the regret by up to 70%.
Researcher Affiliation	Academia	Department of Computer Science, Rutgers University, USA.
Pseudocode	Yes	Aggressive Selective Promotion an ε-best arm algorithm using log (K)-arm memory
Open Source Code	Yes	The codes of the experiment are available on github page streaming-regret-minimization-MABs.
Open Datasets	No	The paper uses simulated Bernoulli arms and does not refer to a publicly available or open dataset. It describes the generation of its own synthetic data.
Dataset Splits	No	The paper describes simulation settings with varying K and T, and running 50 times, but does not provide specific train/validation/test dataset splits. The data is generated via simulation for each run.
Hardware Specification	Yes	The simulations are all carried on a personal device with Apple M1 chip and 8GB memory
Software Dependencies	No	The paper mentions implementing algorithms but does not specify any software dependencies with version numbers (e.g., Python version, specific libraries like NumPy, PyTorch, etc.).
Experiment Setup	Yes	The means of the reward distributions in each instance are sampled uniformly from [0, 1], and we include 50 runs in each setting. The number of arms is set to K = 50000, and the number of arm pulls are tested with 1000K, 1000K2 and 1000K3. In the implementation of different algorithms, we keep the leading constant to be 1 (i.e. we treat O( ) operation as with multiplicative factor of 1) except for multiplicative factor in the multi-level increment of samples (which we use 1.2 instead since it has to be > 1). We also keep the same ε across levels (as opposed to using ε/2ℓ) since the number of levels is small in our experiments.