Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Structure-Leveraged Methods in Breast Cancer Risk Prediction

Authors: Jun Fan, Yirong Wu, Ming Yuan, David Page, Jie Liu, Irene M. Ong, Peggy Peissig, Elizabeth Burnside

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted a retrospective case-control study, garnering 49 mammography descriptors and 77 high-frequency/low-penetrance single-nucleotide polymorphisms (SNPs) from an existing personalized medicine data repository. ... Section 3 presents the results. ... Each combination of these parameters is evaluated using stratified 5-fold cross-validation, and AUC (the area under the receiver operating characteristic (ROC) curve) is used as the performance measure. ... In this section, we demonstrate the performance of the ℓp fused group lasso logistic regression method from three aspects: the significant improvement of AUCs by considering the structure information, the predictive performance under different p (or p1 and p2), and the detected important mammography features and SNPs.
Researcher Affiliation	Academia	Jun Fan EMAIL Department of Statistics University of Wisconsin-Madison 1300 University Avenue, Madison, WI 53706, United States Yirong Wu EMAIL Department of Radiology University of Wisconsin-Madison 600 Highland Avenue, Madison, WI 53792, United States Ming Yuan EMAIL Department of Statistics University of Wisconsin-Madison 1300 University Avenue, Madison, WI 53706, United States David Page EMAIL Department of Biostatistics and Medical Informatics University of Wisconsin-Madison 600 Highland Avenue, Madison, WI 53792, United States Jie Liu EMAIL Department of Genome Sciences University of Washington-Seattle 3720 15th Avenue, Seattle, WA 98105, United States Irene M. Ong EMAIL Department of Biostatistics and Medical Informatics University of Wisconsin-Madison 600 Highland Avenue, Madison, WI 53792, United States Peggy Peissig EMAIL Marshﬁeld Clinic Research Foundation 1000 North Oak Avenue, Marshﬁeld, WI 54449, United States Elizabeth Burnside EMAIL Department of Radiology University of Wisconsin-Madison 600 Highland Avenue, Madison, WI 53792, United States
Pseudocode	No	Many algorithms have been proposed in the literatures to solve the logistic regression with fused lasso regularization (Lin, 2015; Yu et al., 2015). In this subsection we adopt the fast iterative shrinkage thresholding algorithm (Beck and Teboulle, 2009) to solve (2) as βk+1 = arg min β Rd L(βk) + β βk, L(βk) + τ 2 β βk 2 2 + dg βg 2 + λ2 Dgβg p p with β = (β1, , βd)T and τ > 0 the Lipschitz constant of L( ). And the iteration step is equivalent to solving ... With the help of these proximity operators and Bregman splitting algorithm (Ye and Xie, 2011), we can solve (5) by iteratively solving the following procedures: βk+1 = arg min βg 1 2 βg z 2 2 + uk, βg ak + vk, Dgβg bk 2 βg ak 2 2 + µ 2 Dgβg bk 2 2 ak+1 = arg min a ρ1 a 2 + uk, βk+1 a + µ 2 βk+1 a 2 2 bk+1 = arg min b ρ2 b p p + vk, Dgβk+1 b + µ 2 Dgβk+1 b 2 2 uk+1 = uk + µ(βk+1 ak+1) vk+1 = vk + µ(Dgβk+1 bk+1) where µ acts like a step size in this algorithm. Remark 1 The minimization over β, a and b can all be solved in closed form. βk+1 = [(µ + 1)I + µDT g Dg] 1[z + µ(ak uk/µ) + µDT g (bk vk/µ)] ak+1 = S2,1(βk+1 + uk/µ, ρ1/µ) bk+1 = Sp(Dgβk+1 + vk/µ, ρ2/µ). While this section describes the algorithmic steps and procedures, it does so using mathematical notation and descriptive text rather than a formal pseudocode block or algorithm environment.
Open Source Code	No	No explicit statement about open-source code for the methodology described in this paper is found.
Open Datasets	No	The Marshﬁeld Clinic Institutional Review Board approved the use of Marshﬁeld Clinic s Personalized Medicine Research Project (PMRP) (Mc Carty et al., 2005) cohort in our study. The population-based PMRP cohort, details of which have been previously published (Mc Carty et al., 2005), was used in this study. Though the details of this population have been described previously (Burnside et al., 2015), we will summarize here, in brief, for the convenience of the reader. ... We identiﬁed 362 cases and 376 controls (738 in total) who have both genetics and mammogram data available. The paper uses data from the Marshfield Clinic's PMRP cohort, which is referenced by previous publications, but does not provide direct access information (e.g., a link, DOI, or repository) for the specific dataset used in this study.
Dataset Splits	Yes	All 738 samples are randomly partitioned into ﬁve equal sized folds with approximately equal proportions of cases and controls. In each iteration (totally ﬁve iterations), four folds are used as training set and the rest one as validation set to compute AUC. And the parameters with the best average AUC are selected. At last we repeat this process ten times and report the average AUC.
Hardware Specification	No	No specific hardware details (like GPU models, CPU models, or cloud computing instances) used for running experiments are mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers are mentioned. The paper discusses various algorithms and methods but does not list the programming languages or libraries used for implementation with their versions.
Experiment Setup	Yes	The ℓp fused group lasso logistic regression method has several parameters. For the tuning parameters λ1 and λ2, we let them vary among a given set of values, and the shrinkage parameter p (or p1 and p2) among {1, 4/3, 3/2, 2}. Each combination of these parameters is evaluated using stratified 5-fold cross-validation, and AUC (the area under the receiver operating characteristic (ROC) curve) is used as the performance measure. And the parameters with the best average AUC are selected. At last we repeat this process ten times and report the average AUC. We obtain p-value by performing two-tailed-two-sample t-test when we compare AUCs.