Explainable Data Decompositions
Authors: Sebastian Dalleiger, Jilles Vreeken3709-3716
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation on synthetic and real-world data shows that DISC efficiently discovers meaningful components and accurately characterises these in easily understandable terms. |
| Researcher Affiliation | Academia | Sebastian Dalleiger, Jilles Vreeken CISPA Helmholtz Center for Information Security {sebastian.dalleiger, jv}@cispa.de |
| Pseudocode | Yes | Algorithm 1: DESC for Describing the Composition and Algorithm 2: DISC for Discovering the Composition |
| Open Source Code | Yes | We provide the source code, datasets, synthetic dataset generator, and additional information needed for reproducibility.1 and 1https://eda.mmci.uni-saarland.de/disc/ |
| Open Datasets | Yes | We provide the source code, datasets, synthetic dataset generator, and additional information needed for reproducibility.1 and 1https://eda.mmci.uni-saarland.de/disc/ |
| Dataset Splits | No | The paper evaluates on synthetic and real-world datasets but does not explicitly provide details about train/validation/test splits (e.g., percentages, sample counts, or specific split methodologies) for reproduction. |
| Hardware Specification | Yes | We implemented DISC in C++ , ran experiments on a 12-Core Intel Xeon E5-2643 CPU, and report wall-clock time. |
| Software Dependencies | No | The paper states 'We implemented DISC in C++' but does not provide specific version numbers for key software components, libraries, or solvers. |
| Experiment Setup | Yes | In all experiments we have used the same significance level α = 0.01. and Since DBSCAN relies on hyper-parameter, we optimize ℓ using a grid-search over 7 ϵ-candidates and we do not constraint cluster-sizes. |