reproducibilityindex.ai

On Regret with Multiple Best Arms

Authors: Yinglun Zhu, Robert Nowak

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results conﬁrm our theoretical guarantees and show advantages of our algorithms over the previous state-of-the-art. We conduct three experiments to compare our algorithms with baselines. In Section 6.1, we compare the performance of each algorithm on problems with varying hardness levels. We examine how the regret curve of each algorithm increases on synthetic and real-world datasets in Section 6.2 and Section 6.3, respectively.
Researcher Affiliation	Academia	Yinglun Zhu Department of Computer Sciences University of Wisconsin-Madison Madison, WI 53706 yinglun@cs.wisc.edu Robert Nowak Department of Electrical and Computer Engineering University of Wisconsin-Madison Madison, WI 53706 rdnowak@wisc.edu
Pseudocode	Yes	Algorithm 1: MOSS++ Input: Time horizon T and user-speciﬁed parameter β [1/2, 1]. Algorithm 2: MOSS Subroutine (SR) Input: Time horizon T and hardness level α. Algorithm 3: Parallel Input: Time horizon T and the optimal reward µ .
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for their algorithms (MOSS++, Parallel, emp MOSS++) is publicly available.
Open Datasets	Yes	We use a real-world dataset from the New Yorker Magazine Cartoon Caption Contest. The dataset of 1-3 star caption ratings/rewards for Contest 652 consists of n = 10025 captions. Available online at https://nextml.github.io/caption-contest-data.
Dataset Splits	No	The paper describes a multi-armed bandit problem, which involves sequential decision-making over a time horizon (T). This setting does not typically use traditional fixed training, validation, and test dataset splits like those found in supervised machine learning tasks, and no such splits are specified.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to conduct the experiments.
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or specialized solvers).
Experiment Setup	Yes	We set the time horizon to T = 50000 and consider the total number of arms n = 20000. We choose β = 0.5 for MOSS++ and emp MOSS++ in all experiments. For this experiment, we generate best arms with expected reward 0.9 and sub-optimal arms with expected reward evenly distributed among {0.1, 0.2, 0.3, 0.4, 0.5}. All arms follow Bernoulli distribution. We set T = 105 and this results in a hardness level around α 0.43.