Learning to Diagnose with LSTM Recurrent Neural Networks

Authors: Zachary Lipton, David Kale, Charles Elkan, Randall Wetzel

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present the first study to empirically evaluate the ability of LSTMs to recognize patterns in multivariate time series of clinical measurements. Specifically, we consider multilabel classification of diagnoses, training a model to classify 128 diagnoses given 13 frequently but irregularly sampled clinical measurements. First, we establish the effectiveness of a simple LSTM network for modeling clinical data. Then we demonstrate a straightforward and effective training strategy in which we replicate targets at each sequence step. Trained only on raw time series, our models outperform several strong baselines, including a multilayer perceptron trained on hand-engineered features.
Researcher Affiliation Academia Zachary C. Lipton Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093, USA zlipton@cs.ucsd.edu David C. Kale Department of Computer Science PUSH OVER TO LEFT University of Southern California Los Angeles, CA 90089 dkale@usc.edu Charles Elkan Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093, USA elkan@cs.ucsd.edu Randall Wetzel Laura P. and Leland K. Whittier Virtual PICU Children s Hospital Los Angeles Los Angeles, CA 90027 rwetzel@chla.usc.edu
Pseudocode No The paper includes equations for LSTM updates but does not provide any pseudocode or algorithm blocks that are explicitly labeled as such or formatted as structured algorithms.
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the described methodology is open-source or publicly available.
Open Datasets No Our experiments use a collection of anonymized clinical time series extracted from the EHR system at Children s Hospital LA (Marlin et al., 2012; Che et al., 2015) as part of an IRB-approved study. The paper describes an internal dataset and does not provide any access information (like a URL, DOI, or repository) for public use.
Dataset Splits Yes All models are trained on 80% of the data and tested on 10%. The remaining 10% is used as a validation set.
Hardware Specification Yes We acknowledge NVIDIA Corporation for Tesla K40 GPU hardware donation
Software Dependencies No The paper mentions software components and techniques (e.g., LSTMs, SGD, dropout) but does not specify exact version numbers for any libraries, frameworks, or programming languages used in the implementation.
Experiment Setup Yes We train each LSTM for 100 epochs using stochastic gradient descent (SGD) with momentum. To combat exploding gradients, we scale the norm of the gradient and use ℓ2 2 weight decay of 10 6, both hyperparameters chosen using validation data. Our final networks use 2 hidden layers and either 64 memory cells per layer with no dropout or 128 cells per layer with dropout of 0.5. These architectures are also chosen based on validation performance.