Regret Analysis for Continuous Dueling Bandit

Authors: Wataru Kumagai

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We propose a stochastic mirror descent algorithm and show that the algorithm achieves an O( T log T)-regret bound under strong convexity and smoothness assumptions for the cost function. Subsequently, we clarify the equivalence between regret minimization in dueling bandit and convex optimization for the cost function. Moreover, when considering a lower bound in convex optimization, our algorithm is shown to achieve the optimal convergence rate in convex optimization and the optimal regret in dueling bandit except for a logarithmic factor.
Researcher Affiliation Academia Wataru Kumagai Center for Advanced Intelligence Project RIKEN 1-4-1, Nihonbashi, Chuo, Tokyo 103-0027, Japan wataru.kumagai@riken.jp
Pseudocode Yes Algorithm 1 Noisy Comparison-based Stochastic Mirror Descent (NC-SMD) Input: Learning rate η, ν-self-concordant function R, time horizon T, tuning parameters λ, µ Initialize: a1 = argmina AR(a). for t = 1 to T do Update Rt(a) = R(a) + λη 2 t i=1 a ai 2 + µ a 2 Pick a unit vector ut uniformly at random Compare at and a t := at + 2Rt(at) 1 2 ut and receive F(a t, at) Set ˆgt = F(a t, at)d 2Rt(at) 1 2 ut Set at+1 = R 1 t ( Rt(at) ηˆgt) end for Output: a T +1
Open Source Code No The paper does not provide any explicit statement about releasing source code for the methodology, nor does it provide a link to a code repository.
Open Datasets No The paper is theoretical and does not utilize or mention any specific dataset, thus no information about public availability is provided.
Dataset Splits No The paper is theoretical and does not include empirical validation with dataset splits. Therefore, no specific dataset split information is provided.
Hardware Specification No The paper is theoretical and does not describe any computational experiments, thus no hardware specifications are provided.
Software Dependencies No The paper is theoretical and does not describe any computational experiments, thus no software dependencies with version numbers are listed.
Experiment Setup No The paper is theoretical and does not include specific experimental setup details such as hyperparameters or training configurations.