Optimal Batched Linear Bandits

Authors: Xuanfei Ren, Tianyuan Jin, Pan Xu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct thorough experiments to evaluate our algorithm on randomly generated instances and the challenging End of Optimism instances (Lattimore & Szepesvari, 2017) which were shown to be hard to learn for optimism based algorithms. Empirical results show that E4 consistently outperforms baseline algorithms with respect to regret minimization, batch complexity, and computational efficiency.
Researcher Affiliation Collaboration 1University of Science and Technology of China 2National University of Singapore 3Duke University. Correspondence to: Pan Xu <pan.xu@duke.edu>.
Pseudocode Yes Algorithm 1 Explore, Estimate, Eliminate, and Exploit (E4)
Open Source Code Yes The implementation could be found at https://github.com/panxulab/optimal-batched-linear-bandits.
Open Datasets No No specific public dataset links, DOIs, or formal citations are provided for the 'End of Optimism instances' or 'randomly generated instances'. The paper describes how they are constructed (e.g., 'Let ed j denote the natural basis vector in Rd... Then we construct the following hard instances inspired by the End of Optimism (Lattimore & Szepesvari, 2017).'). While Lattimore & Szepesvari (2017) is cited, it's for the idea, not a direct dataset link for reproduction.
Dataset Splits No No specific training/validation/test splits are mentioned. The paper mentions '10 simulations' for each instance but does not specify data splits.
Hardware Specification No No specific hardware (GPU models, CPU models, memory, etc.) is mentioned for running the experiments. The paper focuses on the algorithms and their performance.
Software Dependencies No No specific software dependencies with version numbers are listed. The paper describes the implementation of baselines but not the software environment with versions (e.g., 'we implement the rarely switching OFUL algorithm', 'we follow their computationally efficient variant').
Experiment Setup Yes For the parameters in E4 (Algorithm 1), we fix the exploration rate to be T1. We set the other parameters as described in Theorem 5.4. For rs-OFUL, we implement the rarely switching OFUL algorithm in Abbasi-Yadkori et al. (2011) with switching parameter C = 0.5. For End OA, we follow the Warmup phase and Success phase exactly the same way as Lattimore & Szepesvari (2017) describes, and use OFUL in its Recovery phase. For IDS, we follow their computationally efficient variant (Kirschner et al., 2021, Section 2) step by step. For Pha Elim D, we follow the framework in Esfandiari et al. (2021), and improve the batch sizes design by using exploration rate Ti = T 1 1/2i instead of qi in their paper for better performance.