Choice Bandits

Authors: Arpit Agarwal, Nicholas Johnson, Shivani Agarwal

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that for the special case of k = 2, our algorithm is competitive with previous dueling bandit algorithms, and for the more general case k > 2, outperforms the recently proposed Max Min UCB algorithm designed for the MNL model.
Researcher Affiliation Academia Arpit Agarwal University of Pennsylvania Philadelphia, PA 19104, USA aarpit@seas.upenn.edu Nicholas Johnson University of Minnesota Minneapolis, MN 55455, USA njohnson@cs.umn.edu Shivani Agarwal University of Pennsylvania Philadelphia, PA 19104, USA ashivani@seas.upenn.edu
Pseudocode Yes Algorithm 1 Winner Beats All (WBA)
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets Yes Sushi: Choice model extracted from the Sushi dataset [47];
Dataset Splits No The paper does not specify dataset splits for training, validation, or testing, nor does it mention cross-validation. It refers generally to 'datasets' for experiments.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers).
Experiment Setup Yes The parameter C in our algorithm was set to 1. ... We set α = 0.51 for RUCB and DTS, and f(K) = 0.3K1.01 for RMED, and γ = 1.3 for BTM. ... We set the parameter α to be 0.51 for Max Min UCB.