Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Contimask: Explaining Irregular Time Series via Perturbations in Continuous Time
Authors: Max Moebus, Björn Braun, Christian Holz
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We consider 5 problem settings. We first convert two commonly used synthetic scenarios for regular time series explanations into the continuous time setting... We then adapt these two scenarios... We finish by explaining a model trained on a common problem for irregular time series models: sepsis prediction from hospital records... Metrics For the Rare Time & Rare Feature settings, ground truth saliency maps are available. We calculate the F1 score (F1), Precision (Prec), and Recall (Rec) for correctly identifying these maps... |
| Researcher Affiliation | Academia | Max Moebus, Björn Braun, and Christian Holz Department of Computer Science, ETH Zurich Zurich, Switzerland {max.moebus};{bjoern.braun};{christian.holz}@inf.ethz.ch |
| Pseudocode | No | The paper describes mathematical formulations for perturbations and objective functions but does not present them in a clearly labeled 'pseudocode' or 'algorithm' block. |
| Open Source Code | Yes | Source code is available on Git Hub. |
| Open Datasets | Yes | We train a NCDE and mtan model on the sepsis prediction task as implemented by Kidger et al. [8]. We publicly share our code and all data is either synthetically created as part of the code we provide or publicly available online and we provide the download and processing scripts. |
| Dataset Splits | Yes | We only explain cases on the test set (5.4% mortality). Both models which achieves a binary AUC of roughly 0.90 on a held-out test set (the same 20% split as per [8]). |
| Hardware Specification | Yes | We run all experiments using an H200 GPU needing at most 8GB of VRAM. All experiments were performed on a H200 GPU, where the used VRAM never exceeded 8 GB. |
| Software Dependencies | No | The paper mentions using 'PGPE algorithm [25, 7] as implemented in Evo Torch [35] using the Clip Up optimizer [34]' but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We set λ1 = 0.01, λ2 = 0.001 and train for 16,000 epochs using an Adam optimizer with a learning rate of 0.01, or 2000 iterations using the PGPE optimizer with a population size of 100. For PGPE, we initialize with a radius of 3, and a center learning rate of 0.5 ( 0.3). |