Discovering Interpretable Data-to-Sequence Generators
Authors: Boris Wiegand, Dietrich Klakow, Jilles Vreeken4237-4244
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through an extensive set of experiments including a case study, we show that it ably discovers compact, interpretable and accurate models for the generation and prediction of event sequences from data, has a low sample complexity, and is particularly robust against noise. In this section, we evaluate CONSEQUENCE on both synthetic and real-world datasets. |
| Researcher Affiliation | Collaboration | Boris Wiegand1,2, Dietrich Klakow2, Jilles Vreeken3 1 SHS Stahl-Holding-Saar, Dillingen, Germany 2 Saarland University, Saarbr ucken, Germany 3 CISPA Helmholtz Center for Information Security, Germany |
| Pseudocode | Yes | Algorithm 1: COVER an Event Sequence. For reference, we provide pseudo-code in the supplementary. For further details, we provide pseudo-code in the supplementary. |
| Open Source Code | Yes | We make all code and data publicly available.1 1http://eda.mmci.uni-saarland.de/prj/consequence. We provide code and data for research purpose as well as details for reproducibility in the supplementary.2 2http://eda.mmci.uni-saarland.de/prj/consequence |
| Open Datasets | Yes | Production (Levy 2014) is a collection of event sequences from a production process. Sepsis (Mannhardt 2016) contains trajectories of Sepsis patients in a Dutch hospital. Software, the forth and last dataset in our comparison, is a profiling log of the Java program density-converter (Favre-Bulle 2020). |
| Dataset Splits | No | We ran BOOMER, POSCL, DATS, LSTM and CONSEQUENCE ten times on these datasets with a random train-test-split of 80%. The paper specifies an 80% train-test split but does not explicitly mention a separate validation split or how it was handled. |
| Hardware Specification | Yes | We ran all experiments on a server with two Intel(R) Xeon(R) Silver 4110 CPUs, 128 GB of RAM and two NVIDIA Tesla P100 GPUs. |
| Software Dependencies | No | The paper mentions various software components and baselines like 'LSTM', 'BOOMER', 'CANTMINERPB', 'CLASSY', 'DATS', and 'GERD', but it does not specify any version numbers for these or other ancillary software dependencies. |
| Experiment Setup | Yes | We ran BOOMER, POSCL, DATS, LSTM and CONSEQUENCE ten times on these datasets with a random train-test-split of 80%. For this experiment, we sampled instances from an artificial ground-truth model, split the dataset into a training set with 8000 instances and a test set with 2000 instances, and applied a noise model on the training data. We modified the A* search in a beam search fashion, where we only keep the w best candidates in each iteration. |