Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits
Authors: Chen Wang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further tested the empirical performances of our algorithms on simulated MABs instances, where the proposed algorithms outperform the benchmark uniform exploration algorithm by a large margin and, on occasion, reduce the regret by up to 70%. |
| Researcher Affiliation | Academia | Department of Computer Science, Rutgers University, USA. |
| Pseudocode | Yes | Aggressive Selective Promotion an ε-best arm algorithm using log (K)-arm memory |
| Open Source Code | Yes | The codes of the experiment are available on github page streaming-regret-minimization-MABs. |
| Open Datasets | No | The paper uses simulated Bernoulli arms and does not refer to a publicly available or open dataset. It describes the generation of its own synthetic data. |
| Dataset Splits | No | The paper describes simulation settings with varying K and T, and running 50 times, but does not provide specific train/validation/test dataset splits. The data is generated via simulation for each run. |
| Hardware Specification | Yes | The simulations are all carried on a personal device with Apple M1 chip and 8GB memory |
| Software Dependencies | No | The paper mentions implementing algorithms but does not specify any software dependencies with version numbers (e.g., Python version, specific libraries like NumPy, PyTorch, etc.). |
| Experiment Setup | Yes | The means of the reward distributions in each instance are sampled uniformly from [0, 1], and we include 50 runs in each setting. The number of arms is set to K = 50000, and the number of arm pulls are tested with 1000K, 1000K2 and 1000K3. In the implementation of different algorithms, we keep the leading constant to be 1 (i.e. we treat O( ) operation as with multiplicative factor of 1) except for multiplicative factor in the multi-level increment of samples (which we use 1.2 instead since it has to be > 1). We also keep the same ε across levels (as opposed to using ε/2ℓ) since the number of levels is small in our experiments. |