Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk

Authors: Tianrui Chen, Aditya Gangrade, Venkatesh Saligrama

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This theoretical analysis is complemented by simulation studies demonstrating the effectiveness of the proposed schema... Empirical Results. We complement the above theoretical study with simulations.
Researcher Affiliation Academia 1Boston University 2Carnegie Mellon University.
Pseudocode Yes Algorithm 1 Doubly Optimistic Confidence Bounds; Algorithm 2 Thompson Sampling With Optimistic Safety Indices (TOPSI) for Bernoulli Bandits; Algorithm 3 Thompson Sampling with BAYESUCB (TSBU) for Bernoulli Bandits
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes For the sake of realism, we use the data of Genovese et al. (2013), who report efficacy and infection rates from a phase 2 randomised trial for various dosages of a drug to treat rheumatoid arthritis.
Dataset Splits No The paper describes conducting simulations over a 'horizon' and 'trials' (e.g., '100 trials of horizon 50000'), but it does not specify explicit training, validation, or test dataset splits in the context of typical machine learning reproduction.
Hardware Specification No The paper mentions that methods are implemented on MATLAB, but it does not specify any particular hardware (CPU, GPU models, memory, etc.) used for running the experiments.
Software Dependencies No The paper states 'All methods are implemented on MATLAB' and mentions specific MATLAB functions like 'betainv' and 'betarnd' from the Statistics Toolbox, but it does not provide specific version numbers for MATLAB or its toolboxes.
Experiment Setup Yes The data reported is across 100 trials of horizon 50000. ... We study the safety level 0.21... KL-UCB-based bounds are all evaluated with γt = 1/t... BAYESUCB -based bounds are all evaluated with δk t = 1/(t + 1).