Beyond Bandit Feedback in Online Multiclass Classification
Authors: Dirk van der Hoeven, Federico Fusco, Nicolò Cesa-Bianchi
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on synthetic data show that for various feedback graphs our algorithm is competitive against known baselines. |
| Researcher Affiliation | Academia | Dirk van der Hoeven dirk@dirkvanderhoeven.com Dept. of Computer Science Università degli Studi di Milano, Italy Federico Fusco fuscof@diag.uniroma1.it Dept. of Computer, Control and Management Engineering Sapienza Università di Roma, Italy Nicolò Cesa-Bianchi nicolo.cesa-bianchi@unimi.it DSRC & Dept. of Computer Science Università degli Studi di Milano, Italy |
| Pseudocode | Yes | Algorithm 1: GAPPLETRON |
| Open Source Code | Yes | (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] |
| Open Datasets | No | We empirically evaluated the performance of GAPPLETRON on synthetic data in the bandit, multiclass filtering, and full information settings. Similarly to the Syn Sep and Non Syn Syp datasets described in (Kakade et al., 2008), we generated synthetic datasets with d {80, 120, 160}, K {6, 9, 12}, and the label noise rate in {0, 0.05, 0.1}. |
| Dataset Splits | No | No explicit details on training, validation, or test dataset splits (percentages, counts, or cross-validation) were provided. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or memory) were explicitly provided for the experimental setup within the given text. |
| Software Dependencies | No | The paper mentions 'Online Gradient Descent' but does not specify software names with version numbers for reproducibility. |
| Experiment Setup | Yes | We used three surrogate losses for GAPPLETRON: the logistic loss ℓt(Wt) = log K q(Wt, xt, yt) where q is the softmax, the hinge loss defined in (5), and the smooth hinge loss (Rennie and Srebro, 2005), denoted by Gap Log, Gap Hin, and Gap Sm H respectively. The OCO algorithm used with all losses is Online Gradient Descent, with learning rate ηt = 10 8 + Pt j=1 bℓj(Wt) 2 2 1/2 and no projections. |