Optimal Best-Arm Identification Methods for Tail-Risk Measures

Authors: Shubhada Agrawal, Wouter M. Koolen, Sandeep Juneja

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 1: Histogram of stopping times among 1000 runs on 3 arms, as a function of confidence δ. Vertical bars (solid) indicate the lower bound (4), and (dashed) a version adjusted to our stopping threshold (7), i.e., the n that solves n = β(n, δ)V (µ)-1. In our experiments we implement a version of Track-and-Stop including C-tracking and forced exploration and apply it to Fisher-Tippett (F(µ, σ, γ)), Pareto (P(µ, σ, γ)), and mixtures of Fisher Tippett arms (these heavy-tailed distributions arise in extreme value theory).
Researcher Affiliation Academia Shubhada Agrawal TIFR, Mumbai, India shubhadaiitd@gmail.com Wouter M. Koolen CWI, Amsterdam wmkoolen@cwi.nl Sandeep Juneja TIFR, Mumbai, India juneja@tifr.res.in
Pseudocode No The paper describes the algorithm components (sampling rule, stopping rule, recommendation rule) in text but does not provide a structured pseudocode or algorithm block.
Open Source Code No The paper does not provide concrete access to source code or explicitly state that it is open-source.
Open Datasets No In our experiments we implement a version of Track-and-Stop including C-tracking and forced exploration and apply it to Fisher-Tippett (F(µ, σ, γ)), Pareto (P(µ, σ, γ)), and mixtures of Fisher Tippett arms (these heavy-tailed distributions arise in extreme value theory). Figure 1 shows the distribution of the stopping time as a function of δ in a synthetic three-arm task. No publicly available datasets are mentioned or linked.
Dataset Splits No The paper describes a synthetic multi-armed bandit task where samples are generated sequentially by the algorithm; therefore, traditional dataset splits (training, validation, test) are not applicable or mentioned.
Hardware Specification No The paper does not explicitly describe the hardware used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers needed for replication.
Experiment Setup Yes We select ϵ = 0.7 and B = 4.5. This is a moderately hard problem of complexity V -1(µ) = 49.7. We conclude that even at moderate δ the average sample complexity closely matches the lower bound, especially after adjusting it for the lowerorder terms in the employed stopping threshold β(n, δ).