Recursive Partitioning for Personalization using Observational Data

Authors: Nathan Kallus

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Empirical Investigation We conclude with an empirical investigation of personalization using observational data and our new algorithms.
Researcher Affiliation Academia 1School of Operations Research and Information Engineering and Cornell Tech, Cornell University.
Pseudocode Yes Algorithm 1 PT subroutine, Algorithm 2 PF, Algorithm 3 OPT (complete binary tree)
Open Source Code No The paper does not provide a direct statement or link indicating the release of its own source code for the described methodology.
Open Datasets Yes The baseline data collected on each patient include demographic characteristics (sex, ethnicity, age, weight, height, and smoker), reason for treatment (e.g., atrial fibrillation), current medications, co-morbidities (e.g., diabetes), genotype of two polymorphisms in CYP2C9, and genotype of seven single nucleotide polymorphisms (SNPs) in VKORC1. The correct stable therapeutic dose of warfarin, determined by adjustment over a few weeks, is recorded for each patient and segmented into three dose groups: low ( 21 mg/week, t = 1), medium (> 21, < 49 mg/week, t = 2), and high ( 49 mg/week, t = 3). The dataset was also studied in an online (bandit) setting in (Bastani & Bayati, 2016) where an R&C approach is analyzed. ... We use data from the National Supported Work Demonstration (La Londe, 1986) (combining the experimental sample of 465 subjects with the 2490 PSID controls to create an observational dataset).
Dataset Splits Yes for each n = 100, 200, . . . , 2500, we consider 100 replications in which we randomly select n training subjects and ntest = 2500 test subjects (disjoint, without replacement).', 'parameters tuned on 25% holdout validation as in Swaminathan & Joachims, 2015a;b)
Hardware Specification No The paper does not provide specific details regarding the hardware used for running experiments, such as CPU/GPU models, memory, or cloud specifications.
Software Dependencies No The paper mentions 'R package glmnet' and 'R package gradient.forest' but does not specify their version numbers or the versions of any other software dependencies crucial for reproduction.
Experiment Setup Yes We test standard R&C using four predictive models: OLS, logistic regression, CART (scikit-learn defaults), and k NN (k = n ). We compare these to our three direct personalization methods: PT (nmin-leaf = 20, max = , #features = d), PF (T = 500, nmin-leaf = 10, max = , #features = d), and OPT (nmin-leaf = 20, #features = d, #cuts = 10, = 2 + I [n 300], MIP solve time limited to 1 hour). We also compare to our 1v A strategy using Athey & Imbens (2016) s CT-A (adaptive) and CT-H (honest with 50-50 split) and to IPOEM and INPOEM (parameters tuned on 25% holdout validation as in Swaminathan & Joachims, 2015a;b) with GPS imputed by cross-validated ℓ1-regularized multinomial regression using R package glmnet. ... we omit logistic regression (outcomes not binary), use nmin-leaf = 10 for PT and OPT and = 1 for PF, use = 4 for OPT and let the MIP solve for 24 hours, use logistic regressions to impute GPS for IPOEM and INPOEM, and include the causal forest (CF) extension (Wager & Athey, 2017) of CT as implemented by the R package gradient.forest.