Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Phase Transitions in the Detection of Correlated Databases
Authors: Dor Elimelech, Wasim Huleihel
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We determine sharp thresholds at which optimal testing exhibits a phase transition, depending on the asymptotic regime of n and d. Specifically, we prove that if ฯ2d 0, as d , then weak detection (performing slightly better than random guessing) is statistically impossible, irrespectively of the value of n. This compliments the performance of a simple test that thresholds the sum all entries of XT Y. Furthermore, when d is fixed, we prove that strong detection (vanishing error probability) is impossible for any ฯ < ฯ , where ฯ is an explicit function of d, while weak detection is again impossible as long as ฯ2d = o(1), as n . These results close significant gaps in current recent related studies. |
| Researcher Affiliation | Academia | 1School of Electrical and Computer Engineering, Ben-Gurion university, Beer Sheva 84105, Israel. 2Department of Electrical Engineering-Systems, Tel Aviv university, Tel Aviv 6997801, Israel. |
| Pseudocode | No | The paper defines detection tests using mathematical notation (e.g., ฯsum and ฯcount) but does not provide them in a structured pseudocode block or algorithm environment. |
| Open Source Code | No | The paper does not contain any statements about releasing open-source code for the described methodology, nor does it provide any links to a code repository. |
| Open Datasets | No | The paper defines a probabilistic model for generating synthetic Gaussian databases (e.g., โX1, . . . , Xn, Y1, . . . , Yn N(0d, Id d)โ). It does not use or refer to any publicly available or open datasets for training purposes. |
| Dataset Splits | No | As this is a theoretical paper that does not involve empirical experiments with real datasets, it does not describe any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not involve empirical experiments requiring specific hardware. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe software implementations or dependencies with specific version numbers. |
| Experiment Setup | No | The paper is theoretical and does not involve empirical experiments, thus it does not describe any experimental setup details such as hyperparameters or training configurations. |