reproducibilityindex.ai

Discovering Interpretable Data-to-Sequence Generators

Authors: Boris Wiegand, Dietrich Klakow, Jilles Vreeken4237-4244

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through an extensive set of experiments including a case study, we show that it ably discovers compact, interpretable and accurate models for the generation and prediction of event sequences from data, has a low sample complexity, and is particularly robust against noise. In this section, we evaluate CONSEQUENCE on both synthetic and real-world datasets.
Researcher Affiliation	Collaboration	Boris Wiegand1,2, Dietrich Klakow2, Jilles Vreeken3 1 SHS Stahl-Holding-Saar, Dillingen, Germany 2 Saarland University, Saarbr ucken, Germany 3 CISPA Helmholtz Center for Information Security, Germany
Pseudocode	Yes	Algorithm 1: COVER an Event Sequence. For reference, we provide pseudo-code in the supplementary. For further details, we provide pseudo-code in the supplementary.
Open Source Code	Yes	We make all code and data publicly available.1 1http://eda.mmci.uni-saarland.de/prj/consequence. We provide code and data for research purpose as well as details for reproducibility in the supplementary.2 2http://eda.mmci.uni-saarland.de/prj/consequence
Open Datasets	Yes	Production (Levy 2014) is a collection of event sequences from a production process. Sepsis (Mannhardt 2016) contains trajectories of Sepsis patients in a Dutch hospital. Software, the forth and last dataset in our comparison, is a profiling log of the Java program density-converter (Favre-Bulle 2020).
Dataset Splits	No	We ran BOOMER, POSCL, DATS, LSTM and CONSEQUENCE ten times on these datasets with a random train-test-split of 80%. The paper specifies an 80% train-test split but does not explicitly mention a separate validation split or how it was handled.
Hardware Specification	Yes	We ran all experiments on a server with two Intel(R) Xeon(R) Silver 4110 CPUs, 128 GB of RAM and two NVIDIA Tesla P100 GPUs.
Software Dependencies	No	The paper mentions various software components and baselines like 'LSTM', 'BOOMER', 'CANTMINERPB', 'CLASSY', 'DATS', and 'GERD', but it does not specify any version numbers for these or other ancillary software dependencies.
Experiment Setup	Yes	We ran BOOMER, POSCL, DATS, LSTM and CONSEQUENCE ten times on these datasets with a random train-test-split of 80%. For this experiment, we sampled instances from an artificial ground-truth model, split the dataset into a training set with 8000 instances and a test set with 2000 instances, and applied a noise model on the training data. We modified the A* search in a beam search fashion, where we only keep the w best candidates in each iteration.