When Can We Track Significant Preference Shifts in Dueling Bandits?

Authors: Joe Suk, Arpit Agarwal

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We used a geometric BTL model where the arms are linearly ordered and the i-th best arm beats the j-th best arm with probability P(i j) = 2 i 2 j+2 i . At each changepoint, the ordering of arms was randomly permuted, with a total T = 50, 000 rounds with K = 10 arms. Regret was computed over N = 50 trials of each environment with standard confidence bands shown. Algorithms. We considered four algorithms: (1) METASWIFT (Algorithm 1, (2) the ANACONDA algorithm of Buening and Saha [2022] (3) Interleaved Filtering (which we abbreviate as IF) as specified by Yue et al. [2012b], and a baseline (3) RANDDUEL which naively plays a pair of arms selected uniformly at random every round. Parameters. Parameters associated with each of the algorithms (e.g., the constants in displays (3) and (4), analogous quantities in ANACONDA, and IF s eliminination threshold) were tuned using cross validation on randomly generated geometric BTL environments with number of changepoints varying from 0 to 1000. For fairness, all algorithms were given the chance to tune parameters on the same environments before testing. The first graphic in Figure 2 shows the regret curves in a stationary environment with S = 0 changes. The second graphic shows the regret curves in a non-stationary environment with S = 4 changes. Exact mean and standard deviations on final regret are given in Table 1.
Researcher Affiliation Academia Joe Suk Columbia University joe.suk@columbia.edu Arpit Agarwal Columbia University aa4931@columbia.edu
Pseudocode Yes Algorithm 1: Meta-Elimination while Tracking Arms in SWIFT (METASWIFT) Input: horizon T. ... Algorithm 2: Base-Alg(tstart, m0): SWIFT starting at t0 and running m0 rounds Input: starting round tstart, scheduled duration m0.
Open Source Code Yes Code can be found at https://github.com/joesuk/nonstationary-duel.
Open Datasets No We used a geometric BTL model where the arms are linearly ordered and the i-th best arm beats the j-th best arm with probability P(i j) = 2 i 2 j+2 i . At each changepoint, the ordering of arms was randomly permuted, with a total T = 50, 000 rounds with K = 10 arms. (The paper describes a synthetic data generation model, not a public dataset with access information.)
Dataset Splits No The paper uses synthetic environments to generate data for evaluation and does not explicitly define training, validation, and test splits in the conventional sense for pre-existing datasets.
Hardware Specification No We also acknowledge computing resources from Columbia University’s Shared Research Computing Facility project, which is supported by NIH Research Facility Improvement Grant 1G20RR030893-01, and associated funds from the New York State Empire State Development, Division of Science Technology and Innovation (NYSTAR) Contract C090171, both awarded April 15, 2010. (This mentions a computing facility but does not provide specific hardware details like CPU/GPU models or memory.)
Software Dependencies No The paper mentions implementing algorithms and using cross-validation, but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, specific libraries, or solvers with their versions).
Experiment Setup Yes Parameters associated with each of the algorithms (e.g., the constants in displays (3) and (4), analogous quantities in ANACONDA, and IF s eliminination threshold) were tuned using cross validation on randomly generated geometric BTL environments with number of changepoints varying from 0 to 1000.