Explainable Data Decompositions

Authors: Sebastian Dalleiger, Jilles Vreeken3709-3716

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation on synthetic and real-world data shows that DISC efficiently discovers meaningful components and accurately characterises these in easily understandable terms.
Researcher Affiliation Academia Sebastian Dalleiger, Jilles Vreeken CISPA Helmholtz Center for Information Security {sebastian.dalleiger, jv}@cispa.de
Pseudocode Yes Algorithm 1: DESC for Describing the Composition and Algorithm 2: DISC for Discovering the Composition
Open Source Code Yes We provide the source code, datasets, synthetic dataset generator, and additional information needed for reproducibility.1 and 1https://eda.mmci.uni-saarland.de/disc/
Open Datasets Yes We provide the source code, datasets, synthetic dataset generator, and additional information needed for reproducibility.1 and 1https://eda.mmci.uni-saarland.de/disc/
Dataset Splits No The paper evaluates on synthetic and real-world datasets but does not explicitly provide details about train/validation/test splits (e.g., percentages, sample counts, or specific split methodologies) for reproduction.
Hardware Specification Yes We implemented DISC in C++ , ran experiments on a 12-Core Intel Xeon E5-2643 CPU, and report wall-clock time.
Software Dependencies No The paper states 'We implemented DISC in C++' but does not provide specific version numbers for key software components, libraries, or solvers.
Experiment Setup Yes In all experiments we have used the same significance level α = 0.01. and Since DBSCAN relies on hyper-parameter, we optimize ℓ using a grid-search over 7 ϵ-candidates and we do not constraint cluster-sizes.