A framework for Multi-A(rmed)/B(andit) Testing with Online FDR Control
Authors: Fanny Yang, Aaditya Ramdas, Kevin G. Jamieson, Martin J. Wainwright
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We run extensive simulations to verify our claims, and also report results on real data collected from the New Yorker Cartoon Caption contest. |
| Researcher Affiliation | Academia | Fanny Yang Dept. of EECS, U.C. Berkeley fanny-yang@berkeley.edu Aaditya Ramdas Dept. of EECS and Statistics, U.C. Berkeley ramdas@berkeley.edu Kevin Jamieson Allen School of CSE, U. of Washington jamieson@cs.washington.edu Martin Wainwright Dept. of EECS and Statistics, U.C. Berkeley wainwrig@berkeley.edu |
| Pseudocode | Yes | Procedure 1 MAB-FDR Meta algorithm skeleton. Algorithm 1 Best-arm identification with a control arm for confidence δ and precision ϵ. Procedure 2 MAB-LORD: best-arm identification with online FDR control. |
| Open Source Code | Yes | The code for reproducing all experiments and plots in this paper is publicly available at https://github.com/fanny-yang/MABFDR |
| Open Datasets | Yes | Our experiments are run on artificial data with Gaussian/Bernoulli draws and real-world Bernoulli draws from the New Yorker Cartoon Caption contest. We have access to 1000 such contests over a period of 4 years. |
| Dataset Splits | Yes | In all simulations, 60% of all the hypotheses are true nulls, and their indices are chosen uniformly. The results in Section 4 are based on two different experimental settings: (i) an independent setting where we simulate K = 50 arms for each hypothesis, where we chose 60% of hypotheses to be true nulls and for the remaining 40% (non-nulls) we chose µi for the best alternative randomly in [0.05, 0.2] and other alternatives randomly in [0.0, 0.1]. (ii) a dependent setting (New Yorker data) where the alternatives are not chosen independently. For all results, we average over 100 repetitions. |
| Hardware Specification | No | The paper does not specify any hardware details like CPU models, GPU models, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | Unless otherwise noted, we set ϵ = 0 in all of our simulations to focus on the main ideas and keep the discussion concise. γj = 0.07 log(j 2) / je log j as in [4]. (i) an independent setting where we simulate K = 50 arms for each hypothesis, where we chose 60% of hypotheses to be true nulls and for the remaining 40% (non-nulls) we chose µi for the best alternative randomly in [0.05, 0.2] and other alternatives randomly in [0.0, 0.1]. For all results, we average over 100 repetitions. |