Calibrated Structured Prediction

Authors: Volodymyr Kuleshov, Percy S. Liang

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our proposed recalibrators and features on three real-world tasks. Multiclass image classification. ... Optical character recognition. ... Scene understanding. ... We report results using calibration curves...
Researcher Affiliation Academia Volodymyr Kuleshov Department of Computer Science Stanford University Stanford, CA 94305 Percy Liang Department of Computer Science Stanford University Stanford, CA 94305
Pseudocode Yes Algorithm 1 Recalibration procedure for calibrated structured prediction. Input: Features φ(x, E) from trained model pθ, event set I(x), recalibration set S = {(xi, yi)}n i=1. Output: Forecaster F(x, E). Construct the events dataset: Sbinary = {(φ(x, E), I[y E]) : (x, y) S, E I(x)} Train the forecaster F (e.g., k-NN or decision trees) on Sbinary.
Open Source Code Yes All code, data, and experiments for this paper are available on Coda Lab at https://www.codalab.org/worksheets/0xecc9a01cfcbc4cd6b0444a92d259a87c/.
Open Datasets Yes We perform our experiments on the CIFAR-10 dataset [15], which consists of 60,000 32x32 color images of different types of animals and vehicles (ten classes in total).
Dataset Splits Yes 38,000 images were used for training, 2,000 for calibration, and 20,000 for testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. It mentions running experiments but does not specify the underlying hardware.
Software Dependencies No The paper mentions software components and algorithms like 'linear SVM', 'CRFs', 'k-NN', 'decision trees', and 'AD3', but it does not provide specific version numbers for these software dependencies or libraries.
Experiment Setup Yes We use decision trees and k-NN as our recalibration algorithms... We further discretize probabilities into buckets of size 0.1... For each N and each algorithm we choose a hyperparameter (minimum leaf size for decision trees, k in k-NN) by 10-fold crossvalidation on S. We tried values between 5 and 500 in increments of 5.