Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions

Authors: Siddartha Y. Ramamohan, Arun Rajkumar, Shivani Agarwal, Shivani Agarwal

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments confirm the ability of our algorithms to achieve low regret relative to the target winning set of interest. Here we provide an empirical evaluation of the performance of the proposed dueling bandit algorithms. We used the following three preference matrices in our experiments, one of which is synthetic and two real-world, and none of which posesses a Condorcet winner: PHudry P13, PTennis P8, PMSLR P16.
Researcher Affiliation Collaboration Siddartha Ramamohan Indian Institute of Science Bangalore 56012, India siddartha.yr@csa.iisc.ernet.in Arun Rajkumar Xerox Research Bangalore 560103, India arun_r@csa.iisc.ernet.in Shivani Agarwal University of Pennsylvania Philadelphia, PA 19104, USA ashivani@seas.upenn.edu
Pseudocode Yes Algorithm 1 UCB-TS; Algorithm 2 SELECTPROC-TC; Algorithm 3 SELECTPROC-UC; Algorithm 4 SELECTPROC-BA
Open Source Code No The paper does not provide any explicit statement about releasing open-source code or a link to a code repository.
Open Datasets Yes We used the following three preference matrices in our experiments, one of which is synthetic and two real-world, and none of which posesses a Condorcet winner: PHudry P13: This is constructed from the Hudry tournament shown in Figure 2(b); as noted earlier, this is the smallest tournament whose Copeland set and Banks set are disjoint [14]. ... PMSLR P16: This is constructed from real data from the Microsoft Learning to Rank (MSLR) Web10K data set.
Dataset Splits No The paper discusses the number of trials (T) for the dueling bandit algorithms but does not specify explicit training, validation, or test dataset splits with percentages, sample counts, or cross-validation methods. The experimental setup involves continuous interaction rather than predefined static splits.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or cloud computing instance specifications.
Software Dependencies No The paper describes experimental parameters for algorithms (e.g., exploration parameter α, confidence parameter δ) but does not list any specific software dependencies or libraries with their version numbers (e.g., programming languages, frameworks like PyTorch or TensorFlow, or specific library versions).
Experiment Setup Yes For all the UCB-based algorithms (including our algorithms, RUCB, and CCB), we set the exploration parameter α to 0.51; for SAVAGE-CO, we set the confidence parameter δ to 1/T; and for BTMB, we set δ to 1/T and chose γ to satisfy the γ-relaxed stochastic transitivity property for each preference matrix.