Regret Analysis for Continuous Dueling Bandit
Authors: Wataru Kumagai
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose a stochastic mirror descent algorithm and show that the algorithm achieves an O( T log T)-regret bound under strong convexity and smoothness assumptions for the cost function. Subsequently, we clarify the equivalence between regret minimization in dueling bandit and convex optimization for the cost function. Moreover, when considering a lower bound in convex optimization, our algorithm is shown to achieve the optimal convergence rate in convex optimization and the optimal regret in dueling bandit except for a logarithmic factor. |
| Researcher Affiliation | Academia | Wataru Kumagai Center for Advanced Intelligence Project RIKEN 1-4-1, Nihonbashi, Chuo, Tokyo 103-0027, Japan wataru.kumagai@riken.jp |
| Pseudocode | Yes | Algorithm 1 Noisy Comparison-based Stochastic Mirror Descent (NC-SMD) Input: Learning rate η, ν-self-concordant function R, time horizon T, tuning parameters λ, µ Initialize: a1 = argmina AR(a). for t = 1 to T do Update Rt(a) = R(a) + λη 2 t i=1 a ai 2 + µ a 2 Pick a unit vector ut uniformly at random Compare at and a t := at + 2Rt(at) 1 2 ut and receive F(a t, at) Set ˆgt = F(a t, at)d 2Rt(at) 1 2 ut Set at+1 = R 1 t ( Rt(at) ηˆgt) end for Output: a T +1 |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code for the methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | The paper is theoretical and does not utilize or mention any specific dataset, thus no information about public availability is provided. |
| Dataset Splits | No | The paper is theoretical and does not include empirical validation with dataset splits. Therefore, no specific dataset split information is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe any computational experiments, thus no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not describe any computational experiments, thus no software dependencies with version numbers are listed. |
| Experiment Setup | No | The paper is theoretical and does not include specific experimental setup details such as hyperparameters or training configurations. |