Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
When Data Can't Meet: Estimating Correlation Across Privacy Barriers
Authors: Abhinav Chakraborty, Arnab Auddy, T. Tony Cai
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results from extensive numerical experiments support our theoretical findings. ... 5 Numerical experiments. We evaluate our non interactive sign batch (NI) and interactive sign flip (INT) estimators across different parameter settings. |
| Researcher Affiliation | Academia | Abhinav Chakraborty Columbia University New York, NY 10027 EMAIL Arnab Auddy The Ohio State University Columbus, OH 43210 EMAIL T. Tony Cai The Wharton School University of Pennsylvania Philadelphia, PA 19104 EMAIL |
| Pseudocode | No | The paper describes methods and proofs using mathematical notation and narrative text, but it does not include any explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | All our codes can be found at https:// github.com/abhinavc3/distributed-correlation. |
| Open Datasets | Yes | We illustrate our methods using data from the Health and Retirement Study (HRS), a longitudinal survey of older adults in the United States. |
| Dataset Splits | No | The paper describes simulation parameters like sample size and replications, and mentions using data from the Health and Retirement Study (HRS) for real data experiments, but it does not specify any training/test/validation splits for this real-world dataset. The simulation setup is not a dataset split in the conventional sense for model training/evaluation. |
| Hardware Specification | Yes | All experiments were done on a desktop with 32 GB RAM, and were done over the course of 1 hour. |
| Software Dependencies | No | The paper does not provide specific software names along with their version numbers (e.g., Python 3.8, PyTorch 1.9) that would be needed to replicate the experiments. It only mentions general tools and frameworks without version details. |
| Experiment Setup | Yes | Parameter Grid. We vary our parameters as below, with 250 replications for each cell: Sample size: n {1000, 1500, 2500, 4000, 6000, 9000}. Correlation: ρ {0, 0.15, 0.3, 0.4, 0.5, 0.65, 0.8, 0.9}. Privacy budget: (ε1, ε2) {(0.5, 0.5), (1, 1), (1.5, 0.5)}. |