reproducibilityindex.ai

Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care

Authors: Patrick Schwab, Emanuela Keller, Carl Muroi, David J. Mack, Christian Strässle, Walter Karlen

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that our approach leads to signiﬁcant improvements over several state-of-the-art baselines on real-world ICU data and provide new insights on the importance of task selection and architectural choices in distantly supervised multitask learning.
Researcher Affiliation	Academia	1Institute of Robotics and Intelligent Systems, ETH Zurich, Switzerland 2Neurocritical Care Unit, Department of Neurosurgery, University Hospital Zurich, Switzerland.
Pseudocode	No	The paper describes network architectures and training procedures in text and diagrams, but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	The source code for this work is available online at https://github.com/d909b/DSMT-Nets.
Open Datasets	No	We collected biosignal monitoring data from January to August 2017 (8 months) from consenting patients admitted to the Neurocritical Care Unit at the University of Zurich, Switzerland. The paper does not provide a link or citation for public access to this collected dataset.
Dataset Splits	No	We applied a random split stratiﬁed by alarm classiﬁcation to the whole set of annotated alarms to separate the available data into a training (70%, 1244 alarms) and test set (30%, 533 alarms). While early stopping is mentioned, the paper does not specify a distinct validation set split or its size.
Hardware Specification	No	The paper mentions reducing computational resources and using hyperparameters for neural networks, but it does not provide specific details about the hardware used (e.g., CPU, GPU models, memory).
Software Dependencies	No	The paper mentions various neural network architectures and models (e.g., ResNets, Highway Networks, Ladder Networks, GANs), but it does not specify software dependencies with version numbers (e.g., TensorFlow 1.x, PyTorch 0.x).
Experiment Setup	Yes	To ensure a fair comparison, we used a systematic approach to hyperparameter selection for each evaluated neural network. We trained each model 35 times with a random choice of the three variable hyperparameters bound to the same ranges (1–3 hidden layers, 16–32 units/ﬁlters per hidden layer, 25%–85% dropout). We reset the random seed to the same value for each model in order to make the search deterministic across training runs, i.e. all the models were evaluated on exactly the same set of hyperparameter values. To train the neural network models, we used a learning rate of 0.001 for the ﬁrst ten epochs and 0.0001 afterwards to optimise the binary cross-entropy for the main classiﬁcation output and the mean squared error for all auxiliary tasks. We additionally used early stopping with a patience of 13 epochs. For the extra hyperparameters in Ladder Networks, we set the noise level to be ﬁxed at 0.2 at every layer, the denoising loss weight to 100 for the ﬁrst hidden layer and to 0.1 for every following hidden layer. For the GAN models, we used a base learning rate of 0.0003 for the discriminator and a slightly increased learning rate of 0.003 for the generator to counteract the faster convergence of the discriminator networks. We trained GANs using an early stopping patience on the main loss of 650 steps for a minimum of 2500 steps.