Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Approximating Full Conformal Prediction at Scale via Influence Functions
Authors: Javier Abad Martinez, Umang Bhatt, Adrian Weller, Giovanni Cherubin
AAAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove that our method is a consistent approximation of full CP, and empirically show that the approximation error becomes smaller as the training set increases; e.g., for 1, 000 training points the two methods output p-values that are < 0.001 apart: a negligible error for any practical application. Our methods enable scaling full CP to large real-world datasets. We compare our full CP approximation (ACP) to mainstream CP alternatives, and observe that our method is computationally competitive whilst enjoying the statistical predictive power of full CP. |
| Researcher Affiliation | Collaboration | 1ETH Zurich, Switzerland 2University of Cambridge, UK 3The Alan Turing Institute, London, UK 4Microsoft Research, Cambridge, UK |
| Pseudocode | Yes | Algorithm 1: Full CP; Algorithm 2: Approximate full CP (ACP) |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | We empirically demonstrate that ACP is competitive with existing methods on MNIST (Le Cun 1998), CIFAR10 (Krizhevsky, Nair, and Hinton 2009), and US Census (Ding et al. 2021). [...] We use the standard test/train split of MNIST (60,000 training, 10,000 testing), and CIFAR-10 (50,000 training, 10,000 testing). |
| Dataset Splits | Yes | We use the standard test/train split of MNIST (60,000 training, 10,000 testing), and CIFAR-10 (50,000 training, 10,000 testing). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | Yes | We used Python 3.9.7, PyTorch 1.10.0+cu113, and numpy 1.22.3. |
| Experiment Setup | Yes | For all models, we set the learning rate to 0.001, and train the model for 100 epochs. |