BoXHED: Boosted eXact Hazard Estimator with Dynamic covariates

Authors: Xiaochen Wang, Arash Pakbin, Bobak Mortazavi, Hongyu Zhao, Donald Lee

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of Bo XHED on simulation experiments, and also use it to analyze a cardiovascular disease dataset from the Framingham Heart Study. ... Table 1 presents the L2-errors for the hazard estimators when applied to the simulated datasets. ... Figure 2a presents the AUCt results for the estimators when applied to data simulated from λ1 (no irelevant covariates).
Researcher Affiliation Academia 1Biostatistics Department, Yale University, New Haven, Connecticut, USA 2Computer Science & Engineering, Texas A&M University, College Station, Texas, USA 3Goizueta Business School and Department of Biostatistics & Bioinformatics, Emory University, Atlanta, Georgia, USA.
Pseudocode Yes Algorithm 1 describes the Bo XHED algorithm for estimating λ(t, x).
Open Source Code Yes Bo XHED is available from Git Hub: www.github.com/Bo XHED.
Open Datasets Yes We pool together longitudinal records from two prospective cohorts: The Framingham Heart Study original cohort (FHS) and the Framingham Heart Study Offspring Cohort (FHS-OS) (Dawber et al., 1951).
Dataset Splits Yes The number of boosting iterations M as well as the maximum number of splits L in each tree are hyperparameters that are chosen via K-fold cross-validation. ... The 9,697 study participants are randomly split into 7,000/2,697 for training/testing.
Hardware Specification No The paper does not explicitly describe the specific hardware used (e.g., GPU models, CPU models, or cloud computing instances with specifications) for running its experiments.
Software Dependencies No The current version (1.0) is written in Python and uses regression trees as learners. The paper mentions Python but does not provide specific version numbers for Python or any other software dependencies or libraries used for the experiments.
Experiment Setup Yes Here, M is the number of boosting iterations, and the default learning rate ν = 0.1 is commonly used in boosting applications. The number of boosting iterations M as well as the maximum number of splits L in each tree are hyperparameters that are chosen via K-fold cross-validation. ... The estimated hazard surfaces are scaled to [0, 1] and are aggregated into four clusters using K-means clustering... SBP is bucketed into quintiles (<115, 115-124, 125-139, 140-149, and 150 mm Hg), and DBP is bucketed in the same way (<70, 70-79, 80-84, 85-89, 90 mm Hg).