Dueling Bandits with Adversarial Sleeping
Authors: Aadirupa Saha, Pierre Gaillard
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results are corroborated empirically. |
| Researcher Affiliation | Collaboration | Microsoft Research, New York, US; aasa@microsoft.com. Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France. pierre.gaillard@inria.fr |
| Pseudocode | Yes | Algorithm 1 Sl DB-UCB |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code, nor does it include a link to a code repository. |
| Open Datasets | No | We use the following three different utility based Plackett Luce(θ) preference models (see Sec. 2) that ensures a total-ordering. We now construct three types of problem instances 1. Easy 2. Medium 3. Hard, for any given K, such that items with their respective θ parameters are assigned as follows: 1. Easy: θ(1 : K/2 ) = 1, θ( K/2 + 1 : K) = 0.5. 2. Medium: θ(1 : K/3 ) = 1, θ( K/3 + 1 : 2K/3 ) = 0.7, θ( 2K/3 + 1 : K) = 0.4. 3. Hard: θ(i) = 1 (i 1)/K, i [K]. |
| Dataset Splits | No | The paper describes an online learning framework where data is generated sequentially, and does not discuss traditional training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers for the experimental setup. |
| Experiment Setup | Yes | In every experiment, we set the learning parameters α = 0.51, δ = 1/T for Sl DB-UCB (Alg. 1) and as per Thm. 6 for Sl DB-ED (Alg. 2). All results are averaged over 50 runs. |