Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PAC Prediction Sets for Meta-Learning

Authors: Sangdon Park, Edgar Dobriban, Insup Lee, Osbert Bastani

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the efﬁcacy of our approach on four datasets across three application domains: mini-Image Net and CIFAR10-C in the visual domain, Few Rel in the language domain, and the CDC Heart Dataset in the medical domain. In particular, our prediction sets satisfy the PAC guarantee while having smaller size compared to other baselines that also satisfy this guarantee. 5 Experiments
Researcher Affiliation	Academia	Sangdon Park School of Cybersecurity and Privacy Georgia Institute of Technology EMAIL, Edgar Dobriban Dept. of Statistics & Data Science The Wharton School University of Pennsylvania EMAIL, Insup Lee Dept. of Computer & Info. Science PRECISE Center University of Pennsylvania EMAIL, Osbert Bastani Dept. of Computer & Info. Science PRECISE Center University of Pennsylvania EMAIL
Pseudocode	Yes	Algorithm 1 Meta-PS: PAC prediction set for meta-learning. Internally, any PAC prediction set algorithms are used to implement an estimator ˆγ",δ; we use PS-BINOM (Algorithm 2) in Appendix B.
Open Source Code	No	Code will be released once accepted along with data and precise instructions to run it.
Open Datasets	Yes	We demonstrate the efﬁcacy of our approach on four datasets across three application domains: mini-Image Net [10] and CIFAR10-C [15] in the visual domain, Few Rel [16] in the language domain, and the CDC Heart Dataset [17] in the medical domain.
Dataset Splits	Yes	Mini-Image Net consists of 100 classes with 64 classes for training, 16 classes for calibration, and 20 classes for testing; each class has 600 images. In calibration, we have N = 500 calibration task datasets randomly drawn from the possible tasks, and use 5 shots for adaptation and 500 shots for calibration (i.e., t = 25 and n = 2500). We consider data from 2011-2014 as the training task distributions... and data from 2015-2019 as calibration task distributions...
Hardware Specification	Yes	Experiments are conducted on a cluster with Nvidia Quadro RTX 6000 and Nvidia A100 GPUs.
Software Dependencies	No	The paper mentions using a 'prototypical network' and other frameworks but does not provide specific version numbers for software dependencies such as libraries, programming languages, or solvers.
Experiment Setup	Yes	We consider k-shot c-way learning except for the CDC Heart Dataset; In particular, there are c classes for each task dataset, and k adaptation examples for each class. Thus, we have t := kc labeled examples to adapt a model to a new task. Parameters are N = 500, n = 2500, t = 25, " = 0.1, = 0.1, δ = 10 5 for Meta-PS and " = 0.1, δ = 10 5 for the other methods.