Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Length Optimization in Conformal Prediction
Authors: Shayan Kiyani, George J. Pappas, Hamed Hassani
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive empirical evaluations demonstrate the superior prediction set size performance of CPL compared to state-of-the-art methods across diverse real-world and synthetic datasets in classification, regression, and large language model-based multiple choice question answering. |
| Researcher Affiliation | Academia | Shayan Kiyani, George Pappas, Hamed Hassani Department of Electrical and Systems Engineering University of Pennsylvania EMAIL |
| Pseudocode | Yes | Algorithm 1 Conformal Prediction with Length-Optimization (CPL) |
| Open Source Code | Yes | An Implementation of our algorithm can be accessed at the following link: https://github.com/shayankiyani98/CP. |
| Open Datasets | Yes | We use multiple-choice question answering datasets, including Truthful QA [51], MMLU [52], Open Book QA [53], PIQA[54], and Big Bench [55]. |
| Dataset Splits | Yes | We generate 150K training samples, 50K calibration data points, and 50K test data points. |
| Hardware Specification | No | The paper does not explicitly state the specific hardware used for running the experiments (e.g., CPU/GPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions software like 'Python notebook', 'Llama 2', 'GPT-2', and 'Res Net50 model', but it does not provide specific version numbers for these or any other key software dependencies (e.g., PyTorch version, CUDA version). |
| Experiment Setup | Yes | We use a 2-hidden-layer NN with layers of 20 and 10 neurons for the inner maximization. |