Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback
Authors: Zongqi Wan, Xiaoming Sun, Jialin Zhang
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We show non-oblivious setting incurs Ω(T) pseudo regret even when the loss sequence is bounded memory. However, we propose a wrapper algorithm which enjoys o(T) policy regret on many adversarial bandit problems with the assumption that the loss sequence is bounded memory. Especially, for K-armed bandit and bandit convex optimization, we have O(T 2/3) policy regret bound. We also prove a matching lower bound for K-armed bandit. |
| Researcher Affiliation | Academia | 1Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China |
| Pseudocode | Yes | Algorithm 1 Mini-batch wrapper |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and focuses on proving bounds and developing algorithms, not on empirical training with datasets. No dataset information is provided for training. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical validation with datasets. No validation dataset splits are mentioned. |
| Hardware Specification | No | The paper is theoretical and does not describe any experimental setup that would require hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not describe any experimental setup that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with specific hyperparameters or training configurations. |