Beyond Bandit Feedback in Online Multiclass Classification

Authors: Dirk van der Hoeven, Federico Fusco, Nicolò Cesa-Bianchi

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on synthetic data show that for various feedback graphs our algorithm is competitive against known baselines.
Researcher Affiliation Academia Dirk van der Hoeven dirk@dirkvanderhoeven.com Dept. of Computer Science Università degli Studi di Milano, Italy Federico Fusco fuscof@diag.uniroma1.it Dept. of Computer, Control and Management Engineering Sapienza Università di Roma, Italy Nicolò Cesa-Bianchi nicolo.cesa-bianchi@unimi.it DSRC & Dept. of Computer Science Università degli Studi di Milano, Italy
Pseudocode Yes Algorithm 1: GAPPLETRON
Open Source Code Yes (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets No We empirically evaluated the performance of GAPPLETRON on synthetic data in the bandit, multiclass filtering, and full information settings. Similarly to the Syn Sep and Non Syn Syp datasets described in (Kakade et al., 2008), we generated synthetic datasets with d {80, 120, 160}, K {6, 9, 12}, and the label noise rate in {0, 0.05, 0.1}.
Dataset Splits No No explicit details on training, validation, or test dataset splits (percentages, counts, or cross-validation) were provided.
Hardware Specification No No specific hardware details (like GPU/CPU models or memory) were explicitly provided for the experimental setup within the given text.
Software Dependencies No The paper mentions 'Online Gradient Descent' but does not specify software names with version numbers for reproducibility.
Experiment Setup Yes We used three surrogate losses for GAPPLETRON: the logistic loss ℓt(Wt) = log K q(Wt, xt, yt) where q is the softmax, the hinge loss defined in (5), and the smooth hinge loss (Rennie and Srebro, 2005), denoted by Gap Log, Gap Hin, and Gap Sm H respectively. The OCO algorithm used with all losses is Online Gradient Descent, with learning rate ηt = 10 8 + Pt j=1 bℓj(Wt) 2 2 1/2 and no projections.