Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions

Authors: Siddartha Y. Ramamohan, Arun Rajkumar, Shivani Agarwal, Shivani Agarwal

NeurIPS 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments confirm the ability of our algorithms to achieve low regret relative to the target winning set of interest. Here we provide an empirical evaluation of the performance of the proposed dueling bandit algorithms. We used the following three preference matrices in our experiments, one of which is synthetic and two real-world, and none of which posesses a Condorcet winner: PHudry P13, PTennis P8, PMSLR P16.
Researcher Affiliation Collaboration Siddartha Ramamohan Indian Institute of Science Bangalore 56012, India EMAIL Arun Rajkumar Xerox Research Bangalore 560103, India EMAIL Shivani Agarwal University of Pennsylvania Philadelphia, PA 19104, USA EMAIL
Pseudocode Yes Algorithm 1 UCB-TS; Algorithm 2 SELECTPROC-TC; Algorithm 3 SELECTPROC-UC; Algorithm 4 SELECTPROC-BA
Open Source Code No The paper does not provide any explicit statement about releasing open-source code or a link to a code repository.
Open Datasets Yes We used the following three preference matrices in our experiments, one of which is synthetic and two real-world, and none of which posesses a Condorcet winner: PHudry P13: This is constructed from the Hudry tournament shown in Figure 2(b); as noted earlier, this is the smallest tournament whose Copeland set and Banks set are disjoint [14]. ... PMSLR P16: This is constructed from real data from the Microsoft Learning to Rank (MSLR) Web10K data set.
Dataset Splits No The paper discusses the number of trials (T) for the dueling bandit algorithms but does not specify explicit training, validation, or test dataset splits with percentages, sample counts, or cross-validation methods. The experimental setup involves continuous interaction rather than predefined static splits.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or cloud computing instance specifications.
Software Dependencies No The paper describes experimental parameters for algorithms (e.g., exploration parameter α, confidence parameter δ) but does not list any specific software dependencies or libraries with their version numbers (e.g., programming languages, frameworks like PyTorch or TensorFlow, or specific library versions).
Experiment Setup Yes For all the UCB-based algorithms (including our algorithms, RUCB, and CCB), we set the exploration parameter α to 0.51; for SAVAGE-CO, we set the confidence parameter δ to 1/T; and for BTMB, we set δ to 1/T and chose γ to satisfy the γ-relaxed stochastic transitivity property for each preference matrix.