Evaluating Trauma Patients: Addressing Missing Covariates with Joint Optimization

Authors: Alex Van Esbroeck, Satinder Singh, Ilan Rubinfeld, Zeeshan Syed

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We investigate the utility of this approach on the prediction of several patient outcomes in a large national registry of trauma patients, and find that it significantly outperforms standard sequential methods. We compare the proposed method for joint optimization to standard sequential learning when predicting several important patient outcomes in a large national registry of trauma patients. We demonstrate in a representative cohort of patients that our method provides significant improvement across several metrics. The National Trauma Data Bank (NTDB) collects information about patients and outcomes from trauma centers around the country. The dataset consisted of 162,821 records. For evaluation, the dataset was randomly divided into equal sized training, validation, and testing sets. Classification performance was measured using the area under the receiver operating characteristic curve (AUC).
Researcher Affiliation Collaboration Alex Van Esbroeck1, Satinder Singh1, Ilan Rubinfeld2, Zeeshan Syed1 1Computer Science & Engineering, University of Michigan, Ann Arbor, MI 2Henry Ford Hospital, Detroit, MI
Pseudocode Yes Algorithm 1 Alternating optimization of Equation 5
Open Source Code No The paper does not provide any information about releasing open-source code, nor does it provide a link to a code repository.
Open Datasets No The National Trauma Data Bank (NTDB) collects information about patients and outcomes from trauma centers around the country. The NTDB National Sample Program data from 2009 was used under IRB approval and the data use agreement of the American College of Surgeons. This citation describes the data source but does not provide a direct link, DOI, or specific repository name for public access to the dataset itself. While it references a national program, it doesn't offer concrete access details for third parties.
Dataset Splits Yes The two methods were compared over 20 random splits of the data into training (60%), validation (20%), and testing (20%) sets. For evaluation, the dataset was randomly divided into equal sized training, validation, and testing sets. The validation set was used to select α, the number of GMM components, and the classification regularization parameter.
Hardware Specification Yes Joint optimization converged in fewer than 5 iterations (about 15 minutes to train on 50,000 examples using a 4-core Intel Xeon processor)
Software Dependencies No The paper mentions implementing Algorithm 1 using 'l2 regularized log loss (logistic regression) for the classifier' and 'GMMs for the data model'. It also states 'Estimation of w given θ was done using standard methods for training a logistic regression model'. However, it does not provide specific version numbers for any software, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, scikit-learn).
Experiment Setup Yes A validation set was used to select appropriate choices of α, the number of GMM components K, and the regularization parameter for the classifier. We use alternating optimization of θ and w to optimize Equation 5. We implemented Algorithm 1 using l2 regularized log loss (logistic regression) for the classifier, and use GMMs for the data model. The optimization of Equation 5 is susceptible to local optima. As a result, we run the optimization with multiple random parameter initializations, and select the best model/classifier on a validation set.