Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback

Authors: Zongqi Wan, Xiaoming Sun, Jialin Zhang

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We show non-oblivious setting incurs Ω(T) pseudo regret even when the loss sequence is bounded memory. However, we propose a wrapper algorithm which enjoys o(T) policy regret on many adversarial bandit problems with the assumption that the loss sequence is bounded memory. Especially, for K-armed bandit and bandit convex optimization, we have O(T 2/3) policy regret bound. We also prove a matching lower bound for K-armed bandit.
Researcher Affiliation Academia 1Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China
Pseudocode Yes Algorithm 1 Mini-batch wrapper
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper is theoretical and focuses on proving bounds and developing algorithms, not on empirical training with datasets. No dataset information is provided for training.
Dataset Splits No The paper is theoretical and does not involve empirical validation with datasets. No validation dataset splits are mentioned.
Hardware Specification No The paper is theoretical and does not describe any experimental setup that would require hardware specifications.
Software Dependencies No The paper is theoretical and does not describe any experimental setup that would require specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with specific hyperparameters or training configurations.