A Smoother Way to Train Structured Prediction Models

Authors: Venkata Krishna Pillutla, Vincent Roulet, Sham M. Kakade, Zaid Harchaoui

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experimental results on two real-world problems, namely named entity recognition and visual object localization. The experimental results show that the proposed framework allows us to build upon efficient inference algorithms to develop large-scale optimization algorithms for structured prediction which can achieve competitive performance on the two real-world problems.
Researcher Affiliation Academia Krishna Pillutla, Vincent Roulet, Sham M. Kakade, Zaid Harchaoui Paul G. Allen School of Computer Science & Engineering and Department of Statistics University of Washington name@uw.edu
Pseudocode Yes Algorithm 1 Catalyst with smoothing
Open Source Code Yes The code is publicly available on the authors websites.
Open Datasets Yes We consider the Co NLL 2003 dataset with n = 14987 [63]. We consider the PASCAL VOC 2007 [13] dataset
Dataset Splits No The paper mentions using a 'held-out set' and 'validation F1 score' for tuning, but does not provide specific details on the dataset splits (e.g., percentages or sample counts for train/validation/test).
Hardware Specification No No specific hardware details (such as GPU/CPU models, memory, or cloud instance types) used for running experiments were provided in the paper.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with versions) needed to replicate the experiment.
Experiment Setup Yes BCFW requires no tuning, while SGD requires the tuning of γ0 and t0. The SVRGbased methods require the tuning of a fixed learning rate. Moreover, SVRG and SC-SVRG-const also require tuning the amount of smoothing µ. [...] A fixed budget Tinner = n is used as the stopping criteria in Algorithm 1. [...] We use the value κk = λ for SC-SVRG-adapt. All smooth optimization methods turned out to be robust to the choice of K for the top-K oracle (Fig. 3) we use K = 5 for named entity recognition and K = 10 for visual object localization.