Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Efficiently Verifiable Proofs of Data Attribution

Authors: Ari Karchmer, Seth Neel, Martin Pawelczyk

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This is a theory paper. Provable security or verifiability inherently requires theory, given that we provide guarantees against all possible adversaries. Future work on implementation may involve heuristics based on this work.
Researcher Affiliation Collaboration Morgan Stanley Machine Learning Research, Harvard Business School, EMAIL Harvard Business School, EMAIL Google Research, Harvard Business School, EMAIL
Pseudocode Yes Algorithm 1 Interactive PAC-Verification Protocol for Empirical Influence (Φ)
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [NA] Justification: There is no data or code that goes with this paper.
Open Datasets No Question: Does the paper provide CONCRETE ACCESS INFORMATION (specific link, DOI, repository name, formal citation with authors/year, or reference to established benchmark datasets) for a publicly available or open dataset? Answer: [NA] Justification: There is no data or code that goes with this paper.
Dataset Splits No Question: Does the paper provide SPECIFIC DATASET SPLIT INFORMATION (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning? Answer: [NA] Justification: No models were trained or tested as part of this paper.
Hardware Specification No Question: Does the paper provide SPECIFIC HARDWARE DETAILS (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments? Answer: [NA] Justification: There are no experiments since it is primarily theory paper.
Software Dependencies No Question: Does the paper provide SPECIFIC ANCILLARY SOFTWARE DETAILS (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment? Answer: [NA] Justification: There are no experiments since it is primarily theory paper.
Experiment Setup No Question: Does the paper contain SPECIFIC EXPERIMENTAL SETUP DETAILS (concrete hyperparameter values, training configurations, or system-level settings) in the main text? Answer: [NA] Justification: No models were trained or tested as part of this paper.