Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Adaptive Data Analysis for Growing Data

Authors: Neil Marchant, Benjamin I.P. Rubinstein

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our bound empirically outperforms baselines composed from static bounds. In a batched query setting, the asymptotic data requirements of our bound grows with the square-root of the number of adaptive queries for a fixed accuracy goal (assuming the ratio of final to initial data size is held constant). This improvement matches the improvement of bounds for static data [15] over the data splitting baseline. 4.2 Empirical Comparison with Alternative Guarantees We empirically compare our generalization bounds for growing data with baselines composed from bounds for static data.
Researcher Affiliation Academia Neil G. Marchant School of Computing & Information Systems University of Melbourne, Australia EMAIL Benjamin I. P. Rubinstein School of Computing & Information Systems University of Melbourne, Australia EMAIL
Pseudocode Yes Algorithm 1 Interaction between A and M Algorithm 2 Composition of Clipped Gaussian Mechanisms with z CDP Privacy Filter
Open Source Code No Our empirical results are obtained by evaluating mathematical expressions, so there is no data or code to release.
Open Datasets No Our empirical results are obtained by evaluating mathematical expressions, so there is no data or code to release.
Dataset Splits No We do not train or test models. When instantiating our bounds in Figures 3, 4 and 6, we specify parameter settings in the captions.
Hardware Specification No We do not conduct experiments that require significant compute resources.
Software Dependencies No Our empirical results are obtained by evaluating mathematical expressions, so there is no data or code to release.
Experiment Setup No We do not train or test models. When instantiating our bounds in Figures 3, 4 and 6, we specify parameter settings in the captions.