Data-Driven Subgroup Identification for Linear Regression
Authors: Zachary Izzo, Ruishan Liu, James Zou
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | experiments on real-world medical datasets confirm that it can discover regions where a local linear model has improved performance. |
| Researcher Affiliation | Academia | 1Department of Mathematics, Stanford University, USA 2Department of Biomedical Data Science, Stanford University, USA. |
| Pseudocode | Yes | Algorithm 1 COREGROUP(k, D) ... Algorithm 2 GROWBOX( x, Xrej, U) ... Algorithm 3 DDSUBGROUP(k, U, D) |
| Open Source Code | Yes | Code for the experiments can be found at https://github.com/zleizzo/DDGroup. |
| Open Datasets | Yes | Brazil Health Dataset (Cavalcante et al., 2018) ... China Glucose Dataset (Wang et al., 2017) ... China HIV Dataset (Zhang et al., 2016) ... Dutch Drinking dataset (Boelema et al., 2015) ... Korea Grip Dataset (Wen et al., 2017) |
| Dataset Splits | Yes | For the real-world datasets, we randomly split them into training, test and validation sets, with ratio 50%, 30% and 20%. |
| Hardware Specification | Yes | The average runtime for Algorithm 3 across one run of each dataset was 1.98 seconds on an AMD 7502 CPU, and no individual dataset took longer than 10 seconds. |
| Software Dependencies | No | The paper provides a link to its code repository, but it does not explicitly list software dependencies with specific version numbers within the text. |
| Experiment Setup | Yes | For DDGroup, we used a more general form of the threshold ργ1,γ2(xi) = σγ1 xi + σγ2 and tuned γ1 and γ2 as additional hyperparameters. Specifically, the algorithm works well by simply setting γ2 = 0 and tuning γ1 {2 4, 2 3, . . . , 25}. We also set the size k of the core group equal to p times the size of the training set, where p was selected from within {0.01, 0.05, 0.1, 0.15, 0.2}. ... We did a hyperparameter search over constant rejection thresholds ρ {2, 4, 8, 16, 32, 64}. The core group size was always chosen to be k = n/20. ... We did a hyperparameter search over δ {0.1, 0.05, 0.025, 0.01}. |