reproducibilityindex.ai

Temporal Label Smoothing for Early Event Prediction

Authors: Hugo Yèche, Alizée Pace, Gunnar Ratsch, Rita Kuznetsova

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By focusing the objective on areas with a stronger predictive signal, TLS improves performance over all baselines on two large-scale benchmark tasks. Gains are particularly notable along clinically relevant measures, such as event recall at low false-alarm rates.
Researcher Affiliation	Academia	1Department of Computer Science, ETH Zürich, Switzerland 2ETH AI Center, ETH Zürich, Switzerland 3Max Planck Institute for Intelligent Systems, Tübingen, Germany 4Swiss Institute for Bioinformatics, Zürich, Switzerland.
Pseudocode	Yes	def get_smoothed_labels(event_label_patient, smoothing_fn, h_true, h_min, h_max, kwargs): # Find when event label changes diffs = np.concatenate([np.zeros(1), event_label_patient[1:] event_label_patient[:-1]], axis=-1) pos_event_change = np.where((diffs == 1) & (event_label_patient == 1))[0] # Handle patients with no events if len(pos_event_change) == 0: pos_event_change = np.array([np.inf]) # Compute distance to closest event for each time point time_array = np.arange(len(event_label_patient)) dist_all_event = pos_event_change.reshape(-1, 1) time_array dist_to_closest = np.where(dist_all_event > 0, dist_all_event, np.inf).min(axis=0) return smoothing_fn(dist_to_closest, h_true=h_true, h_min=h_min, h_max=h_max, kwargs)
Open Source Code	Yes	All code is made publicly available at https://github.com/ratschlab/tls.
Open Datasets	Yes	Our work is first evaluated on the prediction of acute circulatory failure within the next h = 12 hours, as defined in the Hi RID-ICU-Benchmark (Hi B) [26]. This task is based on the publicly available Hi RID dataset [7]... We use the framework defined in the MIMIC-III Benchmark (M3B) [35] for the MIMIC-III dataset [37]...
Dataset Splits	No	Hyperparameters introduced by baselines or by our method, such as strength term γ in smoothing parametrization qexp, are optimized through grid searches on the validation set.
Hardware Specification	Yes	We trained all models on a single NVIDIA RTX2080Ti with a Xeon E5-2630v4 core.
Software Dependencies	Yes	A full list of libraries and the version we used is provided in the environment.yml file. The main libraries on which we build our experiments are the following: pytorch 1.11.0 [50], scikit-learn 0.24.1[51], ignite 0.4.4, CUDA 10.2.89[52], cud NN 7.6.5[53], gin-config 0.5.0 [54].
Experiment Setup	Yes	For all models, we set the batch size according to the available hardware capacity. Because transformers are memory-consuming, we train the decompensation models with a batch size of 8 stays. On the other hand, we train the GRU model for circulatory failure with a batch size of 64... Exact parameters are reported in Table 6... Table 6: Hyperparameter search range for circulatory failure with GRU [39] backbone... Learning Rate (1e-5, 3e-5, 1e-4, 3e-4) Drop-out (0.0, 0.1, 0.2, 0.3, 0.4) Depth (1, 2, 3) Hidden Dimension (32, 64, 128, 256) L1 Regularization (1e-2, 1e-1, 1, 10, 100).