reproducibilityindex.ai

Calibrated Structured Prediction

Authors: Volodymyr Kuleshov, Percy S. Liang

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test our proposed recalibrators and features on three real-world tasks. Multiclass image classiﬁcation. ... Optical character recognition. ... Scene understanding. ... We report results using calibration curves...
Researcher Affiliation	Academia	Volodymyr Kuleshov Department of Computer Science Stanford University Stanford, CA 94305 Percy Liang Department of Computer Science Stanford University Stanford, CA 94305
Pseudocode	Yes	Algorithm 1 Recalibration procedure for calibrated structured prediction. Input: Features φ(x, E) from trained model pθ, event set I(x), recalibration set S = {(xi, yi)}n i=1. Output: Forecaster F(x, E). Construct the events dataset: Sbinary = {(φ(x, E), I[y E]) : (x, y) S, E I(x)} Train the forecaster F (e.g., k-NN or decision trees) on Sbinary.
Open Source Code	Yes	All code, data, and experiments for this paper are available on Coda Lab at https://www.codalab.org/worksheets/0xecc9a01cfcbc4cd6b0444a92d259a87c/.
Open Datasets	Yes	We perform our experiments on the CIFAR-10 dataset [15], which consists of 60,000 32x32 color images of different types of animals and vehicles (ten classes in total).
Dataset Splits	Yes	38,000 images were used for training, 2,000 for calibration, and 20,000 for testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. It mentions running experiments but does not specify the underlying hardware.
Software Dependencies	No	The paper mentions software components and algorithms like 'linear SVM', 'CRFs', 'k-NN', 'decision trees', and 'AD3', but it does not provide specific version numbers for these software dependencies or libraries.
Experiment Setup	Yes	We use decision trees and k-NN as our recalibration algorithms... We further discretize probabilities into buckets of size 0.1... For each N and each algorithm we choose a hyperparameter (minimum leaf size for decision trees, k in k-NN) by 10-fold crossvalidation on S. We tried values between 5 and 500 in increments of 5.