Data-Driven Subgroup Identification for Linear Regression

Authors: Zachary Izzo, Ruishan Liu, James Zou

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental experiments on real-world medical datasets confirm that it can discover regions where a local linear model has improved performance.
Researcher Affiliation Academia 1Department of Mathematics, Stanford University, USA 2Department of Biomedical Data Science, Stanford University, USA.
Pseudocode Yes Algorithm 1 COREGROUP(k, D) ... Algorithm 2 GROWBOX( x, Xrej, U) ... Algorithm 3 DDSUBGROUP(k, U, D)
Open Source Code Yes Code for the experiments can be found at https://github.com/zleizzo/DDGroup.
Open Datasets Yes Brazil Health Dataset (Cavalcante et al., 2018) ... China Glucose Dataset (Wang et al., 2017) ... China HIV Dataset (Zhang et al., 2016) ... Dutch Drinking dataset (Boelema et al., 2015) ... Korea Grip Dataset (Wen et al., 2017)
Dataset Splits Yes For the real-world datasets, we randomly split them into training, test and validation sets, with ratio 50%, 30% and 20%.
Hardware Specification Yes The average runtime for Algorithm 3 across one run of each dataset was 1.98 seconds on an AMD 7502 CPU, and no individual dataset took longer than 10 seconds.
Software Dependencies No The paper provides a link to its code repository, but it does not explicitly list software dependencies with specific version numbers within the text.
Experiment Setup Yes For DDGroup, we used a more general form of the threshold ργ1,γ2(xi) = σγ1 xi + σγ2 and tuned γ1 and γ2 as additional hyperparameters. Specifically, the algorithm works well by simply setting γ2 = 0 and tuning γ1 {2 4, 2 3, . . . , 25}. We also set the size k of the core group equal to p times the size of the training set, where p was selected from within {0.01, 0.05, 0.1, 0.15, 0.2}. ... We did a hyperparameter search over constant rejection thresholds ρ {2, 4, 8, 16, 32, 64}. The core group size was always chosen to be k = n/20. ... We did a hyperparameter search over δ {0.1, 0.05, 0.025, 0.01}.