Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Prediction-Powered Adaptive Shrinkage Estimation
Authors: Sida Li, Nikolaos Ignatiadis
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both synthetic and real-world datasets show that PAS adapts to the reliability of the ML predictions and outperforms traditional and modern baselines in large-scale applications. We conduct extensive experiments on both synthetic and real-world datasets. |
| Researcher Affiliation | Academia | 1Data Science Institute, The University of Chicago 2Department of Statistics, The University of Chicago. Correspondence to: Sida Li <EMAIL>. |
| Pseudocode | Yes | A pseudo-code implementation is also presented in Algorithm 1. |
| Open Source Code | Yes | The code for reproducing the experiments is available at https://github.com/listar2000/predictionpowered-adaptive-shrinkage. |
| Open Datasets | Yes | Experiments on both synthetic and real-world datasets show that PAS adapts to the reliability of the ML predictions and outperforms traditional and modern baselines in large-scale applications. Fisch et al. (2024) have shown improvements in estimating the fraction of spiral galaxies using predictions on images from the Galaxy Zoo 2 dataset (Willett et al., 2013). Amazon Review Ratings (SNAP, 2014). The Amazon Fine Food Reviews dataset, provided by the Stanford Network Analysis Project (SNAP; SNAP (2014)) on Kaggle. |
| Dataset Splits | Yes | we randomly split the data points of each problem into labeled/unlabeled partitions (where we choose a 20/80 split ratio). For both datasets, we randomly split the data for each problem (a food product or galaxy subgroup) into a labeled and unlabeled partition with a 20/80 ratio. |
| Hardware Specification | Yes | All the experiments were conducted on a compute cluster with Intel Xeon Silver 4514Y (16 cores) CPU, Nvidia A100 (80GB) GPU, and 64GB of memory. |
| Software Dependencies | No | The paper mentions software like 'Hugging Face s transformers library (Wolf, 2019)', 'bert-base-multilingual-uncased-sentiment model (Town, 2023)', 'Res Net50 architecture (He et al., 2016)', and 'Adam optimizer (Kingma & Ba, 2015)'. However, specific version numbers for these software libraries or frameworks are not provided, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | We use a batch size of 256 and Adam optimizer (Kingma & Ba, 2015) with a learning rate of 1e-3. After 20 epochs, the model achieves 87% training accuracy and 83% test accuracy. |