Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Sample-Specific Models with Low-Rank Personalized Regression
Authors: Ben Lengerich, Bryon Aragam, Eric P. Xing
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare personalized regression (hereafter, PR) to four baselines: 1) Population linear or logistic regression, 2) A mixture regression (MR) model, 3) Varying coefficients (VC), 4) Deep neural networks (DNN). First, we evaluate each method s ability to recover the true parameters from simulated data. Then we present three real data case studies, each progressively more challenging than the previous: 1) Stock prediction using financial data, 2) Cancer diagnosis from mass spectrometry data, and 3) Electoral prediction using historical election data. The results are summarized in Table 1 for easy reference. |
| Researcher Affiliation | Academia | Benjamin Lengerich Carnegie Mellon University EMAIL Bryon Aragam University of Chicago EMAIL Eric P. Xing Carnegie Mellon University EMAIL |
| Pseudocode | Yes | Algorithm 1 Personalized Estimation |
| Open Source Code | Yes | A Python implementation is available at http://www.github.com/blengerich/ personalized_regression. |
| Open Datasets | Yes | Here, we investigate the capacity of PR to distinguish malignant from benign skin lesions using a dataset of desorption electrospray ionization mass spectrometry imaging (DESI-MSI) of a common skin cancer, basal cell carcinoma (BCC) [22] (details in supplement). |
| Dataset Splits | No | The paper refers to 'test sets' and 'out-of-sample prediction results', implying a split for evaluation, but does not specify explicit percentages, counts, or a standard citation for train/validation/test dataset splits. |
| Hardware Specification | Yes | With these performance improvements, we are able to fit models to datasets with over 10,000 samples and 1000s of predictors on a Macbook Pro with 16GB RAM in under an hour. |
| Software Dependencies | No | The paper mentions 'A Python implementation' but does not specify specific software dependencies with version numbers. |
| Experiment Setup | Yes | A discussion of hyperparameter selection is contained in Section. B.3 of the supplement. and Each personalized estimator is endowed with a personalized learning rate (i) t = t/kb (i) t b (pop)k1, which scales the global learning rate t according to how far the estimator has traveled. and In our experiments, we use kn = 3. |