Recursive Partitioning for Personalization using Observational Data
Authors: Nathan Kallus
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Empirical Investigation We conclude with an empirical investigation of personalization using observational data and our new algorithms. |
| Researcher Affiliation | Academia | 1School of Operations Research and Information Engineering and Cornell Tech, Cornell University. |
| Pseudocode | Yes | Algorithm 1 PT subroutine, Algorithm 2 PF, Algorithm 3 OPT (complete binary tree) |
| Open Source Code | No | The paper does not provide a direct statement or link indicating the release of its own source code for the described methodology. |
| Open Datasets | Yes | The baseline data collected on each patient include demographic characteristics (sex, ethnicity, age, weight, height, and smoker), reason for treatment (e.g., atrial fibrillation), current medications, co-morbidities (e.g., diabetes), genotype of two polymorphisms in CYP2C9, and genotype of seven single nucleotide polymorphisms (SNPs) in VKORC1. The correct stable therapeutic dose of warfarin, determined by adjustment over a few weeks, is recorded for each patient and segmented into three dose groups: low ( 21 mg/week, t = 1), medium (> 21, < 49 mg/week, t = 2), and high ( 49 mg/week, t = 3). The dataset was also studied in an online (bandit) setting in (Bastani & Bayati, 2016) where an R&C approach is analyzed. ... We use data from the National Supported Work Demonstration (La Londe, 1986) (combining the experimental sample of 465 subjects with the 2490 PSID controls to create an observational dataset). |
| Dataset Splits | Yes | for each n = 100, 200, . . . , 2500, we consider 100 replications in which we randomly select n training subjects and ntest = 2500 test subjects (disjoint, without replacement).', 'parameters tuned on 25% holdout validation as in Swaminathan & Joachims, 2015a;b) |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running experiments, such as CPU/GPU models, memory, or cloud specifications. |
| Software Dependencies | No | The paper mentions 'R package glmnet' and 'R package gradient.forest' but does not specify their version numbers or the versions of any other software dependencies crucial for reproduction. |
| Experiment Setup | Yes | We test standard R&C using four predictive models: OLS, logistic regression, CART (scikit-learn defaults), and k NN (k = n ). We compare these to our three direct personalization methods: PT (nmin-leaf = 20, max = , #features = d), PF (T = 500, nmin-leaf = 10, max = , #features = d), and OPT (nmin-leaf = 20, #features = d, #cuts = 10, = 2 + I [n 300], MIP solve time limited to 1 hour). We also compare to our 1v A strategy using Athey & Imbens (2016) s CT-A (adaptive) and CT-H (honest with 50-50 split) and to IPOEM and INPOEM (parameters tuned on 25% holdout validation as in Swaminathan & Joachims, 2015a;b) with GPS imputed by cross-validated ℓ1-regularized multinomial regression using R package glmnet. ... we omit logistic regression (outcomes not binary), use nmin-leaf = 10 for PT and OPT and = 1 for PF, use = 4 for OPT and let the MIP solve for 24 hours, use logistic regressions to impute GPS for IPOEM and INPOEM, and include the causal forest (CF) extension (Wager & Athey, 2017) of CT as implemented by the R package gradient.forest. |