Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Co-Regularization Enhances Knowledge Transfer in High Dimensions
Authors: Shuo Shuo Liu, Haotian Lin, Matthew Reimherr, Runze Li
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct empirical studies on synthetic data and real COVID-19 data to demonstrate the effectiveness of Co RT. 2 4.1 Simulation In this section, we compare Co RT to the existing methods, including (1) naive-Lasso: Lasso regression on the target data only, which serves as the baseline of our study; (2) Trans-GLM (Tian and Feng, 2023): the two-step transfer model for GLMs. Co RT (Lasso) and Co RT (SCAD) represent the training with Lasso and SCAD regularizers, respectively. |
| Researcher Affiliation | Collaboration | Shuo Shuo Liu1 Haotian Lin1, 2 Matthew Reimherr1, 2 Runze Li1 1The Pennsylvania State University 2Amazon |
| Pseudocode | Yes | Algorithm 1: Adaptive Co-Regularization Transfer 1: Input: Datasets D(k) for 0 k K. 2: Data splitting: randomly split the target data D(0) into T (an odd number) parts of equal size and denote them as D(0) [1] , , D(0) [T ]. |
| Open Source Code | Yes | 2Code is available at https://github.com/shuoshuoliu/UTrans-Co RT. |
| Open Datasets | Yes | The data that we study is synthesized from various government and nonprofit institutions for all 3142 counties in the US. The data are stored from 1/22/20-12/21/20 and publicly available 3. We refer interested readers to Li et al. (2021) for more details about the data collection. County-level characteristics include demographic, race, socioeconomic, and medical comorbidities variables. 3https://github.com/lin-lab/COVID-Health-Disparities |
| Dataset Splits | Yes | We randomly split the target data into 80% for training and the remaining for testing. |
| Hardware Specification | Yes | All the simulations are on a desktop computer running Windows 11 with an Intel Core i9 CPU at 5 GHz with 32 GB of RAM. |
| Software Dependencies | No | To implement the methods mentioned in Section 4.1, we use R package glmnet for naive-Lasso, R package glmtrans for Trans-GLM, and R package ncvreg for our proposed Co RT (Breheny and Huang, 2011; R Core Team, 2025). ... We use the R packages random Forest, xgboost, and e1071 (R Core Team, 2025) for implementations. |
| Experiment Setup | Yes | We set h {5, 10, 20, 40}, n0 {50, 75, 100}, and nk = 200 for all k = 1, , |S|. We use max_depth=15 and nrounds=50 for XGBoost and default parameters in the other packages. |