Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction
Authors: Sangdon Park, Osbert Bastani, Nikolai Matni, Insup Lee
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we evaluate our approach on three benchmarks: Res Net (He et al., 2016) for Image Net (Russakovsky et al., 2015), a model (Held et al., 2016) learned for a visual object tracking benchmark (Wu et al., 2013), and a probabilistic dynamics model (Chua et al., 2018) learned for the half-cheetah environment (Brockman et al., 2016) (Section 4). |
| Researcher Affiliation | Academia | Sangdon Park University of Pennsylvania EMAIL Osbert Bastani University of Pennsylvania EMAIL Nikolai Matni University of Pennsylvania EMAIL Insup Lee University of Pennsylvania EMAIL |
| Pseudocode | Yes | Algorithm 1 Algorithm for solving (3). procedure ESTIMATECONFIDENCESETPREDICTOR(Ztrain, Z train, Zval) |
| Open Source Code | Yes | 1Our code is available at https://github.com/sangdon/PAC-confidence-set. |
| Open Datasets | Yes | Finally, we evaluate our approach on three benchmarks: Res Net (He et al., 2016) for Image Net (Russakovsky et al., 2015), a model (Held et al., 2016) learned for a visual object tracking benchmark (Wu et al., 2013), and a probabilistic dynamics model (Chua et al., 2018) learned for the half-cheetah environment (Brockman et al., 2016) (Section 4). |
| Dataset Splits | Yes | We randomly split these sequences to form the training set for calibration, validation set for confidence set estimation, and test set for evaluation. For each sequence, a pair of two adjacent frames constitute a single example. Our training dataset contains 20,882 labeled examples, each consisting of of a pair of consecutive images and ground truth bounding boxes. The validation set for confidence set estimation and test set contain 22,761 and 22,761 labeled examples, respectively. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments. It only discusses the neural networks and datasets used. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments. It mentions various models and datasets but no software versions. |
| Experiment Setup | Yes | We use our algorithm to compute confidence sets for Res Net (He et al., 2016) on Image Net (Russakovsky et al., 2015), for ϵ = 0.01, δ = 10 5, and n = 20000 validation images. |