reproducibilityindex.ai

Evaluating Trauma Patients: Addressing Missing Covariates with Joint Optimization

Authors: Alex Van Esbroeck, Satinder Singh, Ilan Rubinfeld, Zeeshan Syed

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We investigate the utility of this approach on the prediction of several patient outcomes in a large national registry of trauma patients, and ﬁnd that it signiﬁcantly outperforms standard sequential methods. We compare the proposed method for joint optimization to standard sequential learning when predicting several important patient outcomes in a large national registry of trauma patients. We demonstrate in a representative cohort of patients that our method provides signiﬁcant improvement across several metrics. The National Trauma Data Bank (NTDB) collects information about patients and outcomes from trauma centers around the country. The dataset consisted of 162,821 records. For evaluation, the dataset was randomly divided into equal sized training, validation, and testing sets. Classiﬁcation performance was measured using the area under the receiver operating characteristic curve (AUC).
Researcher Affiliation	Collaboration	Alex Van Esbroeck1, Satinder Singh1, Ilan Rubinfeld2, Zeeshan Syed1 1Computer Science & Engineering, University of Michigan, Ann Arbor, MI 2Henry Ford Hospital, Detroit, MI
Pseudocode	Yes	Algorithm 1 Alternating optimization of Equation 5
Open Source Code	No	The paper does not provide any information about releasing open-source code, nor does it provide a link to a code repository.
Open Datasets	No	The National Trauma Data Bank (NTDB) collects information about patients and outcomes from trauma centers around the country. The NTDB National Sample Program data from 2009 was used under IRB approval and the data use agreement of the American College of Surgeons. This citation describes the data source but does not provide a direct link, DOI, or specific repository name for public access to the dataset itself. While it references a national program, it doesn't offer concrete access details for third parties.
Dataset Splits	Yes	The two methods were compared over 20 random splits of the data into training (60%), validation (20%), and testing (20%) sets. For evaluation, the dataset was randomly divided into equal sized training, validation, and testing sets. The validation set was used to select α, the number of GMM components, and the classiﬁcation regularization parameter.
Hardware Specification	Yes	Joint optimization converged in fewer than 5 iterations (about 15 minutes to train on 50,000 examples using a 4-core Intel Xeon processor)
Software Dependencies	No	The paper mentions implementing Algorithm 1 using 'l2 regularized log loss (logistic regression) for the classiﬁer' and 'GMMs for the data model'. It also states 'Estimation of w given θ was done using standard methods for training a logistic regression model'. However, it does not provide specific version numbers for any software, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow, scikit-learn).
Experiment Setup	Yes	A validation set was used to select appropriate choices of α, the number of GMM components K, and the regularization parameter for the classiﬁer. We use alternating optimization of θ and w to optimize Equation 5. We implemented Algorithm 1 using l2 regularized log loss (logistic regression) for the classiﬁer, and use GMMs for the data model. The optimization of Equation 5 is susceptible to local optima. As a result, we run the optimization with multiple random parameter initializations, and select the best model/classiﬁer on a validation set.