Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Partial Correlation Network Estimation by Semismooth Newton Methods
Authors: DongWon Kim, Sungdong Lee, Joong-Ho (Johann) Won
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both simulated and real-world genomic datasets demonstrate the superior convergence behavior and computational efficiency of the proposed algorithm, which position our method as a promising tool for massive-scale network analysis sought for in, e.g., modern multi-omics research. |
| Researcher Affiliation | Academia | Dongwon Kim Department of Statistics Seoul National University EMAIL Sungdong Lee Department of Medicine National University of Singapore EMAIL Joong-Ho Won Department of Statistics Seoul National University EMAIL |
| Pseudocode | Yes | Algorithm 1: Damped B-semismooth Newton method with line search |
| Open Source Code | No | The paper does not provide an explicit link to a code repository, a statement that code is included in supplementary materials, or an unambiguous sentence stating the release of code for the described methodology within the main text. |
| Open Datasets | Yes | Specifically, we apply Algorithm 1 and proximal gradient method (ACCORD-FBS) to the LIHC dataset from The Cancer Genome Atlas [1] |
| Dataset Splits | No | Section 4.1 describes generating synthetic data: "generated n = 500 observations from a p = 1, 000-dimensional zero-mean multivariate Gaussian distribution". No explicit train/test/validation splits are mentioned for this generated data. Section 4.3 mentions the LIHC dataset "consisting of p = 305, 471 features including RNA transcription levels and the DNA methylation status of human genomes from n = 365 samples." It doesn't mention how these samples were split into train/test/validation sets for the experiments. |
| Hardware Specification | Yes | Matrix operations are parallelized on six NVIDIA RTX 6000 Ada generation GPUs using CUDA 11.7. |
| Software Dependencies | Yes | The algorithm is implemented in PyTorch 1.13.1. Matrix operations are parallelized on six NVIDIA RTX 6000 Ada generation GPUs using CUDA 11.7. |
| Experiment Setup | Yes | To assess the convergence behavior, we generated n = 500 observations from a p = 1, 000-dimensional zero-mean multivariate Gaussian distribution with precision matrix Θ that contained 3% non-zero entries, locations of which were sampled uniformly at random. The regularization parameter λ was set to 0.1, 0.15, and 0.2 to ensure that the estimated precision matrix exhibits a sparsity level comparable to Θ . For the line search parameter (line 11), we use ρk = max{0.7 0.003 k, 0.4} and σ = 0.001. The regularization parameter was set to λ = 0.45, selected via the extended pseudo-Bayesian information criterion [19]; a detailed sensitivity analysis of estimation performance around this value of λ is provided in Appendix G.2. To test robustness of Algorithm 1, we further examined the optimization performance for λ {0.20, 0.60}, corresponding to relatively dense and sparse estimates. |