Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction

Authors: Sangdon Park, Osbert Bastani, Nikolai Matni, Insup Lee

ICLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we evaluate our approach on three benchmarks: Res Net (He et al., 2016) for Image Net (Russakovsky et al., 2015), a model (Held et al., 2016) learned for a visual object tracking benchmark (Wu et al., 2013), and a probabilistic dynamics model (Chua et al., 2018) learned for the half-cheetah environment (Brockman et al., 2016) (Section 4).
Researcher Affiliation Academia Sangdon Park University of Pennsylvania EMAIL Osbert Bastani University of Pennsylvania EMAIL Nikolai Matni University of Pennsylvania EMAIL Insup Lee University of Pennsylvania EMAIL
Pseudocode Yes Algorithm 1 Algorithm for solving (3). procedure ESTIMATECONFIDENCESETPREDICTOR(Ztrain, Z train, Zval)
Open Source Code Yes 1Our code is available at https://github.com/sangdon/PAC-confidence-set.
Open Datasets Yes Finally, we evaluate our approach on three benchmarks: Res Net (He et al., 2016) for Image Net (Russakovsky et al., 2015), a model (Held et al., 2016) learned for a visual object tracking benchmark (Wu et al., 2013), and a probabilistic dynamics model (Chua et al., 2018) learned for the half-cheetah environment (Brockman et al., 2016) (Section 4).
Dataset Splits Yes We randomly split these sequences to form the training set for calibration, validation set for confidence set estimation, and test set for evaluation. For each sequence, a pair of two adjacent frames constitute a single example. Our training dataset contains 20,882 labeled examples, each consisting of of a pair of consecutive images and ground truth bounding boxes. The validation set for confidence set estimation and test set contain 22,761 and 22,761 labeled examples, respectively.
Hardware Specification No The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments. It only discusses the neural networks and datasets used.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments. It mentions various models and datasets but no software versions.
Experiment Setup Yes We use our algorithm to compute confidence sets for Res Net (He et al., 2016) on Image Net (Russakovsky et al., 2015), for ϵ = 0.01, δ = 10 5, and n = 20000 validation images.