Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Optimal community detection in dense bipartite graphs
Authors: Julien Chhor, Parker Knight
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We consider the problem of detecting a community of densely connected vertices in a high-dimensional bipartite graph of size n1 n2. Under the null hypothesis, the observed graph is drawn from a bipartite Erd os-Renyi distribution with connection probability p0. Under the alternative hypothesis, there exists an unknown bipartite subgraph of size k1 k2 in which edges appear with probability p1 = p0 + δ for some δ > 0, while all other edges outside the subgraph appear with probability p0. Specifically, we provide non-asymptotic upper and lower bounds on the smallest signal strength δ that is both necessary and sufficient to ensure the existence of a test with small enough Type I and Type II errors. We also derive novel minimax-optimal tests achieving these fundamental limits when the underlying graph is sufficiently dense. Our proposed tests involve a combination of hard-thresholded nonlinear statistics of the adjacency matrix, the analysis of which may be of independent interest. In contrast with previous work, our non-asymptotic upper and lower bounds match for any configuration of n1, n2, k1, k2. |
| Researcher Affiliation | Academia | Julien Chhor Department of Mathematics and Statistics Toulouse School of Economics Toulouse, France EMAIL Parker Knight Department of Biostatistics Harvard University Boston, MA, USA EMAIL |
| Pseudocode | No | The paper describes mathematical definitions for various tests (e.g., total degree test, truncated degree test, max truncated degree test) using equations and textual descriptions, but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | No | Our max truncated degree test is unfeasible to compute on datasets of even moderate size. It is well established that the community detection problem in non-bipartite graphs exhibits a statistical-computational gap [46, 4, 31, 12, 22, 57, 26], meaning that it is not, in general, possible to optimally detect a planted community with a test that runs in polynomial time. We conjecture that this gap persists in the bipartite case, and establishing this phenomenon formally is an open problem of great interest. |
| Open Datasets | No | The paper focuses on theoretical analysis of community detection in random bipartite graphs and does not use any specific publicly available datasets for experimental validation. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments with datasets, therefore no dataset splits are discussed. |
| Hardware Specification | No | The paper does not describe any experimental setup or computational results, thus no hardware specifications are provided. |
| Software Dependencies | No | The paper is theoretical and does not describe any empirical experiments, therefore no software dependencies with version numbers are mentioned. |
| Experiment Setup | No | The paper focuses on theoretical analysis and derivation of minimax rates and optimal tests, without conducting any empirical experiments. Therefore, there are no experimental setup details or hyperparameters discussed. |