Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Partial Correlation Network Estimation by Semismooth Newton Methods

Authors: DongWon Kim, Sungdong Lee, Joong-Ho (Johann) Won

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on both simulated and real-world genomic datasets demonstrate the superior convergence behavior and computational efficiency of the proposed algorithm, which position our method as a promising tool for massive-scale network analysis sought for in, e.g., modern multi-omics research.
Researcher Affiliation	Academia	Dongwon Kim Department of Statistics Seoul National University EMAIL Sungdong Lee Department of Medicine National University of Singapore EMAIL Joong-Ho Won Department of Statistics Seoul National University EMAIL
Pseudocode	Yes	Algorithm 1: Damped B-semismooth Newton method with line search
Open Source Code	No	The paper does not provide an explicit link to a code repository, a statement that code is included in supplementary materials, or an unambiguous sentence stating the release of code for the described methodology within the main text.
Open Datasets	Yes	Specifically, we apply Algorithm 1 and proximal gradient method (ACCORD-FBS) to the LIHC dataset from The Cancer Genome Atlas [1]
Dataset Splits	No	Section 4.1 describes generating synthetic data: "generated n = 500 observations from a p = 1, 000-dimensional zero-mean multivariate Gaussian distribution". No explicit train/test/validation splits are mentioned for this generated data. Section 4.3 mentions the LIHC dataset "consisting of p = 305, 471 features including RNA transcription levels and the DNA methylation status of human genomes from n = 365 samples." It doesn't mention how these samples were split into train/test/validation sets for the experiments.
Hardware Specification	Yes	Matrix operations are parallelized on six NVIDIA RTX 6000 Ada generation GPUs using CUDA 11.7.
Software Dependencies	Yes	The algorithm is implemented in PyTorch 1.13.1. Matrix operations are parallelized on six NVIDIA RTX 6000 Ada generation GPUs using CUDA 11.7.
Experiment Setup	Yes	To assess the convergence behavior, we generated n = 500 observations from a p = 1, 000-dimensional zero-mean multivariate Gaussian distribution with precision matrix Θ that contained 3% non-zero entries, locations of which were sampled uniformly at random. The regularization parameter λ was set to 0.1, 0.15, and 0.2 to ensure that the estimated precision matrix exhibits a sparsity level comparable to Θ . For the line search parameter (line 11), we use ρk = max{0.7 0.003 k, 0.4} and σ = 0.001. The regularization parameter was set to λ = 0.45, selected via the extended pseudo-Bayesian information criterion [19]; a detailed sensitivity analysis of estimation performance around this value of λ is provided in Appendix G.2. To test robustness of Algorithm 1, we further examined the optimization performance for λ {0.20, 0.60}, corresponding to relatively dense and sparse estimates.