Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits

Authors: Chen Wang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further tested the empirical performances of our algorithms on simulated MABs instances, where the proposed algorithms outperform the benchmark uniform exploration algorithm by a large margin and, on occasion, reduce the regret by up to 70%.
Researcher Affiliation Academia Department of Computer Science, Rutgers University, USA.
Pseudocode Yes Aggressive Selective Promotion an ε-best arm algorithm using log (K)-arm memory
Open Source Code Yes The codes of the experiment are available on github page streaming-regret-minimization-MABs.
Open Datasets No The paper uses simulated Bernoulli arms and does not refer to a publicly available or open dataset. It describes the generation of its own synthetic data.
Dataset Splits No The paper describes simulation settings with varying K and T, and running 50 times, but does not provide specific train/validation/test dataset splits. The data is generated via simulation for each run.
Hardware Specification Yes The simulations are all carried on a personal device with Apple M1 chip and 8GB memory
Software Dependencies No The paper mentions implementing algorithms but does not specify any software dependencies with version numbers (e.g., Python version, specific libraries like NumPy, PyTorch, etc.).
Experiment Setup Yes The means of the reward distributions in each instance are sampled uniformly from [0, 1], and we include 50 runs in each setting. The number of arms is set to K = 50000, and the number of arm pulls are tested with 1000K, 1000K2 and 1000K3. In the implementation of different algorithms, we keep the leading constant to be 1 (i.e. we treat O( ) operation as with multiplicative factor of 1) except for multiplicative factor in the multi-level increment of samples (which we use 1.2 instead since it has to be > 1). We also keep the same ε across levels (as opposed to using ε/2ℓ) since the number of levels is small in our experiments.