Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
Authors: Aadirupa Saha, Shubham Gupta
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive simulations corroborate our results. |
| Researcher Affiliation | Industry | 1Microsoft Research, New York City, United States. 2IBM Research, Orsay, France. |
| Pseudocode | Yes | Algorithm 1 presents the pseudocode for DEX3.P. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is open-source or publicly available. |
| Open Datasets | No | We simulate an environment where these values follow a Gaussian random walk. That is, for every t ∈ [T] and i < j, Pt+1(i, j) = Pt(i, j) + ϵt(i, j), where ϵt(i, j) ∼ N(0, 0.002). ... The initial values P1(i, j) ∼ Uniform(0, 1). |
| Dataset Splits | No | The paper describes generating synthetic data for simulations but does not specify distinct training, validation, and test dataset splits. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks). |
| Experiment Setup | Yes | The values of parameters α, β, η, and γ for DEX3.P and DEX3.S were set in accordance with Theorems 3.3 and 4.1 (or 4.4 as appropriate from the context), respectively. ... We simulate an environment where these values follow a Gaussian random walk. That is, for every t ∈ [T] and i < j, Pt+1(i, j) = Pt(i, j) + ϵt(i, j), where ϵt(i, j) ∼ N(0, 0.002). |