reproducibilityindex.ai

Regret Analysis for Continuous Dueling Bandit

Authors: Wataru Kumagai

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We propose a stochastic mirror descent algorithm and show that the algorithm achieves an O( T log T)-regret bound under strong convexity and smoothness assumptions for the cost function. Subsequently, we clarify the equivalence between regret minimization in dueling bandit and convex optimization for the cost function. Moreover, when considering a lower bound in convex optimization, our algorithm is shown to achieve the optimal convergence rate in convex optimization and the optimal regret in dueling bandit except for a logarithmic factor.
Researcher Affiliation	Academia	Wataru Kumagai Center for Advanced Intelligence Project RIKEN 1-4-1, Nihonbashi, Chuo, Tokyo 103-0027, Japan wataru.kumagai@riken.jp
Pseudocode	Yes	Algorithm 1 Noisy Comparison-based Stochastic Mirror Descent (NC-SMD) Input: Learning rate η, ν-self-concordant function R, time horizon T, tuning parameters λ, µ Initialize: a1 = argmina AR(a). for t = 1 to T do Update Rt(a) = R(a) + λη 2 t i=1 a ai 2 + µ a 2 Pick a unit vector ut uniformly at random Compare at and a t := at + 2Rt(at) 1 2 ut and receive F(a t, at) Set ˆgt = F(a t, at)d 2Rt(at) 1 2 ut Set at+1 = R 1 t ( Rt(at) ηˆgt) end for Output: a T +1
Open Source Code	No	The paper does not provide any explicit statement about releasing source code for the methodology, nor does it provide a link to a code repository.
Open Datasets	No	The paper is theoretical and does not utilize or mention any specific dataset, thus no information about public availability is provided.
Dataset Splits	No	The paper is theoretical and does not include empirical validation with dataset splits. Therefore, no specific dataset split information is provided.
Hardware Specification	No	The paper is theoretical and does not describe any computational experiments, thus no hardware specifications are provided.
Software Dependencies	No	The paper is theoretical and does not describe any computational experiments, thus no software dependencies with version numbers are listed.
Experiment Setup	No	The paper is theoretical and does not include specific experimental setup details such as hyperparameters or training configurations.