reproducibilityindex.ai

Data-Driven Subgroup Identification for Linear Regression

Authors: Zachary Izzo, Ruishan Liu, James Zou

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	experiments on real-world medical datasets conﬁrm that it can discover regions where a local linear model has improved performance.
Researcher Affiliation	Academia	1Department of Mathematics, Stanford University, USA 2Department of Biomedical Data Science, Stanford University, USA.
Pseudocode	Yes	Algorithm 1 COREGROUP(k, D) ... Algorithm 2 GROWBOX( x, Xrej, U) ... Algorithm 3 DDSUBGROUP(k, U, D)
Open Source Code	Yes	Code for the experiments can be found at https://github.com/zleizzo/DDGroup.
Open Datasets	Yes	Brazil Health Dataset (Cavalcante et al., 2018) ... China Glucose Dataset (Wang et al., 2017) ... China HIV Dataset (Zhang et al., 2016) ... Dutch Drinking dataset (Boelema et al., 2015) ... Korea Grip Dataset (Wen et al., 2017)
Dataset Splits	Yes	For the real-world datasets, we randomly split them into training, test and validation sets, with ratio 50%, 30% and 20%.
Hardware Specification	Yes	The average runtime for Algorithm 3 across one run of each dataset was 1.98 seconds on an AMD 7502 CPU, and no individual dataset took longer than 10 seconds.
Software Dependencies	No	The paper provides a link to its code repository, but it does not explicitly list software dependencies with specific version numbers within the text.
Experiment Setup	Yes	For DDGroup, we used a more general form of the threshold ργ1,γ2(xi) = σγ1 xi + σγ2 and tuned γ1 and γ2 as additional hyperparameters. Speciﬁcally, the algorithm works well by simply setting γ2 = 0 and tuning γ1 {2 4, 2 3, . . . , 25}. We also set the size k of the core group equal to p times the size of the training set, where p was selected from within {0.01, 0.05, 0.1, 0.15, 0.2}. ... We did a hyperparameter search over constant rejection thresholds ρ {2, 4, 8, 16, 32, 64}. The core group size was always chosen to be k = n/20. ... We did a hyperparameter search over δ {0.1, 0.05, 0.025, 0.01}.