Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Co-Regularization Enhances Knowledge Transfer in High Dimensions

Authors: Shuo Shuo Liu, Haotian Lin, Matthew Reimherr, Runze Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct empirical studies on synthetic data and real COVID-19 data to demonstrate the effectiveness of Co RT. 2 4.1 Simulation In this section, we compare Co RT to the existing methods, including (1) naive-Lasso: Lasso regression on the target data only, which serves as the baseline of our study; (2) Trans-GLM (Tian and Feng, 2023): the two-step transfer model for GLMs. Co RT (Lasso) and Co RT (SCAD) represent the training with Lasso and SCAD regularizers, respectively.
Researcher Affiliation	Collaboration	Shuo Shuo Liu1 Haotian Lin1, 2 Matthew Reimherr1, 2 Runze Li1 1The Pennsylvania State University 2Amazon
Pseudocode	Yes	Algorithm 1: Adaptive Co-Regularization Transfer 1: Input: Datasets D(k) for 0 k K. 2: Data splitting: randomly split the target data D(0) into T (an odd number) parts of equal size and denote them as D(0) [1] , , D(0) [T ].
Open Source Code	Yes	2Code is available at https://github.com/shuoshuoliu/UTrans-Co RT.
Open Datasets	Yes	The data that we study is synthesized from various government and nonprofit institutions for all 3142 counties in the US. The data are stored from 1/22/20-12/21/20 and publicly available 3. We refer interested readers to Li et al. (2021) for more details about the data collection. County-level characteristics include demographic, race, socioeconomic, and medical comorbidities variables. 3https://github.com/lin-lab/COVID-Health-Disparities
Dataset Splits	Yes	We randomly split the target data into 80% for training and the remaining for testing.
Hardware Specification	Yes	All the simulations are on a desktop computer running Windows 11 with an Intel Core i9 CPU at 5 GHz with 32 GB of RAM.
Software Dependencies	No	To implement the methods mentioned in Section 4.1, we use R package glmnet for naive-Lasso, R package glmtrans for Trans-GLM, and R package ncvreg for our proposed Co RT (Breheny and Huang, 2011; R Core Team, 2025). ... We use the R packages random Forest, xgboost, and e1071 (R Core Team, 2025) for implementations.
Experiment Setup	Yes	We set h {5, 10, 20, 40}, n0 {50, 75, 100}, and nk = 200 for all k = 1, , \|S\|. We use max_depth=15 and nrounds=50 for XGBoost and default parameters in the other packages.