Optimal Streaming Algorithms for Multi-Armed Bandits
Authors: Tianyuan Jin, Keke Huang, Jing Tang, Xiaokui Xiao
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper studies two variants of the best arm identification (BAI) problem under the streaming model, where we have a stream of n arms with reward distributions supported on [0, 1] with unknown means. We propose an algorithm that works for any k and achieves the optimal sample complexity O( nδ ) using a single-arm memory and a single pass of the stream. We present a single-arm-memory algorithm that achieves a near instance-dependent optimal sample complexity within O(log 1 2 ) passes, where 2 is the gap between the mean of the best arm and that of the second best arm. Let armo be the selected arm, and armi be the i-th arm in the stream where i {1, 2, . . . , n}. Let µarm and bµarm be arm s true mean and estimated mean respectively, and arm be the best arm. In the first step, we initialize armo with arm1. When armi arrives, we compare armi with armo and decide whether armi should be the new armo. In particular, our algorithm mainly consists of the following two operations. 1. Sampling. We pull each arrived arm Θ( 1δ ) times to estimate its true mean. 2. Comparison. We replace armo with armi if bµarmi bµarmo + α, where α = Θ(ε) is a random variable following a predefined distribution. Our algorithm maintains the following property. Property 1. Whenever we update armo, we always ensure that |bµarmo µarmo| O(ε). We present our solution for the streaming ε-BAI problem, i.e., ε-KAI with k = 1. We then extend our solution to address the ε-KAI problem for the general case of k in Section 3. Algorithm 1 presents the pseudo-code of our algorithm. Theorem 1. Given a stream of n arms, approximation parameter ε and confidence parameter δ in (0, 1), Algorithm 1 finds the ε-best arm with probability at least 1 δ using expected O( nδ ) pulls and a single-arm memory. The proof of the optimal sample complexity is highly nontrivial and is also one of our main technical contributions. We refer readers to Appendix A for details. Proof of Lemma 1. We prove the lemma by three steps. Based on Lemma 1, we are then ready to accomplish the correctness of Algorithm 1. The key idea is to bound the total expected number of pulls during the life cycle of each selected armo, i.e., the period from armo replacing its predecessor to armo being replaced by its successor. Algorithm 2 presents the pseudo-code for ε-KAI. Our main result is formalized in the following theorem. Theorem 2. Given a stream of n arms, approximation parameter ε and confidence parameter δ in (0, 1), Algorithm 2 finds ε-top-k arms with probability at least 1 δ using expected O( nδ )) pulls and a single-arm memory. Detailed proofs are in Appendix B. Algorithm 3 presents the pseudo-code to address the streaming BAI problem. Our main result for Algorithm 3 is formalized in the following theorem. Theorem 3. Given a stream of n arms and confidence parameter δ (0, 1), Algorithm 3 identifies the optimal arm with probability at least 1 δ, in which case it takes expected O Pn i=2 1 2 i log 1δ log 1 i arm pulls and O(log 1 2 ) passes using a single-arm memory. The detail proofs are in Appendix C. |
| Researcher Affiliation | Academia | Tianyuan Jin 1 Keke Huang 1 Jing Tang 2 Xiaokui Xiao 1 1School of Computing, National University of Singapore, Singapore 2Data Science and Analytics Thrust, The Hong Kong University of Science and Technology, Guangzhou, China. Correspondence to: Xiaokui Xiao <xkxiao@nus.edu.sg>. |
| Pseudocode | Yes | Algorithm 1: Streaming ε-BAI, Algorithm 2: Streaming ε-KAI, Algorithm 3: ID-BAI |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper focuses on theoretical algorithms and their analysis for abstract "streams of n arms with reward distributions." It does not conduct experiments on specific, named, publicly available datasets. |
| Dataset Splits | No | The paper presents theoretical algorithms and their performance guarantees, not empirical evaluations on datasets. Therefore, it does not specify dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is a theoretical work focusing on algorithm design and analysis. It does not describe any computational experiments or simulations that would require specific hardware, thus no hardware specifications are mentioned. |
| Software Dependencies | No | The paper focuses on the design and theoretical analysis of algorithms. It does not mention any specific software dependencies or library versions required for implementation or reproduction. |
| Experiment Setup | No | The paper provides algorithmic parameters (e.g., ε, δ, C, sℓ, τj) that are part of the algorithm's definition and theoretical analysis, rather than hyperparameters or system-level settings for an empirical experiment setup. |