A framework for Multi-A(rmed)/B(andit) Testing with Online FDR Control

Authors: Fanny Yang, Aaditya Ramdas, Kevin G. Jamieson, Martin J. Wainwright

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run extensive simulations to verify our claims, and also report results on real data collected from the New Yorker Cartoon Caption contest.
Researcher Affiliation Academia Fanny Yang Dept. of EECS, U.C. Berkeley fanny-yang@berkeley.edu Aaditya Ramdas Dept. of EECS and Statistics, U.C. Berkeley ramdas@berkeley.edu Kevin Jamieson Allen School of CSE, U. of Washington jamieson@cs.washington.edu Martin Wainwright Dept. of EECS and Statistics, U.C. Berkeley wainwrig@berkeley.edu
Pseudocode Yes Procedure 1 MAB-FDR Meta algorithm skeleton. Algorithm 1 Best-arm identification with a control arm for confidence δ and precision ϵ. Procedure 2 MAB-LORD: best-arm identification with online FDR control.
Open Source Code Yes The code for reproducing all experiments and plots in this paper is publicly available at https://github.com/fanny-yang/MABFDR
Open Datasets Yes Our experiments are run on artificial data with Gaussian/Bernoulli draws and real-world Bernoulli draws from the New Yorker Cartoon Caption contest. We have access to 1000 such contests over a period of 4 years.
Dataset Splits Yes In all simulations, 60% of all the hypotheses are true nulls, and their indices are chosen uniformly. The results in Section 4 are based on two different experimental settings: (i) an independent setting where we simulate K = 50 arms for each hypothesis, where we chose 60% of hypotheses to be true nulls and for the remaining 40% (non-nulls) we chose µi for the best alternative randomly in [0.05, 0.2] and other alternatives randomly in [0.0, 0.1]. (ii) a dependent setting (New Yorker data) where the alternatives are not chosen independently. For all results, we average over 100 repetitions.
Hardware Specification No The paper does not specify any hardware details like CPU models, GPU models, or memory specifications used for running experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes Unless otherwise noted, we set ϵ = 0 in all of our simulations to focus on the main ideas and keep the discussion concise. γj = 0.07 log(j 2) / je log j as in [4]. (i) an independent setting where we simulate K = 50 arms for each hypothesis, where we chose 60% of hypotheses to be true nulls and for the remaining 40% (non-nulls) we chose µi for the best alternative randomly in [0.05, 0.2] and other alternatives randomly in [0.0, 0.1]. For all results, we average over 100 repetitions.