Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Tight Regret Bounds for Single-pass Streaming Multi-armed Bandits
Authors: Chen Wang
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further tested the empirical performances of our algorithms on simulated MABs instances, where the proposed algorithms outperform the benchmark uniform exploration algorithm by a large margin and, on occasion, reduce the regret by up to 70%. |
| Researcher Affiliation | Academia | Department of Computer Science, Rutgers University, USA. |
| Pseudocode | Yes | Aggressive Selective Promotion an ε-best arm algorithm using log (K)-arm memory |
| Open Source Code | Yes | The codes of the experiment are available on github page streaming-regret-minimization-MABs. |
| Open Datasets | No | The paper uses simulated Bernoulli arms and does not refer to a publicly available or open dataset. It describes the generation of its own synthetic data. |
| Dataset Splits | No | The paper describes simulation settings with varying K and T, and running 50 times, but does not provide specific train/validation/test dataset splits. The data is generated via simulation for each run. |
| Hardware Specification | Yes | The simulations are all carried on a personal device with Apple M1 chip and 8GB memory |
| Software Dependencies | No | The paper mentions implementing algorithms but does not specify any software dependencies with version numbers (e.g., Python version, specific libraries like NumPy, PyTorch, etc.). |
| Experiment Setup | Yes | The means of the reward distributions in each instance are sampled uniformly from [0, 1], and we include 50 runs in each setting. The number of arms is set to K = 50000, and the number of arm pulls are tested with 1000K, 1000K2 and 1000K3. In the implementation of different algorithms, we keep the leading constant to be 1 (i.e. we treat O( ) operation as with multiplicative factor of 1) except for multiplicative factor in the multi-level increment of samples (which we use 1.2 instead since it has to be > 1). We also keep the same ε across levels (as opposed to using ε/2ℓ) since the number of levels is small in our experiments. |