Stochastic Online Greedy Learning with Semi-bandit Feedbacks

Authors: Tian Lin, Jian Li, Wei Chen

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We first propose the greedy regret and ϵ-quasi greedy regret as learning metrics comparing with the performance of offline greedy algorithm. We then propose two online greedy learning algorithms with semi-bandit feedbacks... Both algorithms achieve O(log T) problem-dependent regret bound... We further show that the bound is tight... Due to the space constraint, the analysis of algorithms, applications and empirical evaluation of the lower bound are moved to the supplementary material.
Researcher Affiliation Collaboration Tian Lin Tsinghua University Beijing, China lintian06@gmail.com Jian Li Tsinghua University Beijing, China lapordge@gmail.com Wei Chen Microsoft Research Beijing, China weic@microsoft.com
Pseudocode Yes Algorithm 1 OG Subroutine 2 UCB(A, ˆX( ), N( ), t) Subroutine 3 LUCBϵ,δ(A, ˆX( ), N( ), t) Algorithm 4 OG-LUCB-R (i.e., OG-LUCB with Restart)
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets No The paper does not mention any specific dataset used for training or provide access information for a publicly available or open dataset.
Dataset Splits No The paper does not provide specific details on training, validation, or test dataset splits.
Hardware Specification No The paper does not explicitly describe the hardware specifications used to run its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup No The paper does not contain specific experimental setup details, such as concrete hyperparameter values or training configurations.