Delayed Bandits: When Do Intermediate Observations Help?

Authors: Emmanuel Esposito, Saeed Masoudian, Hao Qiu, Dirk Van Der Hoeven, Nicolò Cesa-Bianchi, Yevgeny Seldin

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6. Experiments We empirically compare our algorithm Meta BIO with the following baselines: DAda-Exp3 (Gyorgy & Joulani, 2021) for adversarial delayed bandits without intermediate observations (which we used to instantiate the algorithm B), the standard UCB1 algorithm (Auer et al., 2002a) for stochastic bandits without delays and intermediate observations, and NSD-UCRL2 (Vernade et al., 2020) for nonstationary stochastic action-state mappings and stochastic losses. We run all experiments with a time horizon of T = 10^4.
Researcher Affiliation Academia 1Universit a degli Studi di Milano, Milan, Italy 2Istituto Italiano di Tecnologia, Genoa, Italy 3University of Copenhagen, Copenhagen, Denmark 4Korteweg-de Vries Institute for Mathematics University of Amsterdam, Amsterdam, Netherlands 5Politecnico di Milano, Milan, Italy.
Pseudocode Yes Algorithm 1: Meta BIO Input: Algorithm B for standard delayed bandits, confidence parameter δ (0, 1) Initialize L(s) = for all s S for t = 1, . . . , T do Get At from B Observe St = st(At) for j : j + dj = t do Receive (j, ℓj(Sj)) Update L(Sj) = L(Sj) {(j, ℓj(Sj))} and Algorithm 2: Meta Ada BIO Input: Algorithm B for standard delayed bandits, confidence parameter δ (0, 1) Initialize D0 = 0 for t = 1, . . . , T do Get At from B for j : j + dj = t do Receive (j, ℓj(Sj)) Feed (j, Aj, ℓj(Sj)) to B
Open Source Code No The paper does not provide any statement or link indicating that open-source code for the described methodology is available.
Open Datasets No The paper describes experimental setups (e.g., "stationary version of the experiments in (Vernade et al., 2020)") and parameters (Table 1, Table 2 in Appendix D) but does not provide concrete access information (link, DOI, specific citation with author/year for a dataset) for a publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information (percentages, sample counts, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory, or computer specifications) used for running its experiments.
Software Dependencies No The paper mentions names of algorithms (e.g., DAda-Exp3, UCB1, NSD-UCRL2) and general programming environments (e.g., Java in the acknowledgements of a cited work), but does not specify software dependencies with version numbers for reproducibility.
Experiment Setup Yes All our experiments with a time horizon of T = 10^4. ... We set K = 4 and S = 3, while we repeat this experiment for the previously mentioned values of delays. ... We set K = 4, S = 3... The interval between two consecutive changes in the distribution of action-state mappings grows exponentially. See Table 2 in Appendix D for details. ... Here we set K = 8, d = 100, and investigate how the performance of Meta BIO changes when the number S of states varies in {4, 6, 8, 10, 12}.