reproducibilityindex.ai

Explainable Data Decompositions

Authors: Sebastian Dalleiger, Jilles Vreeken3709-3716

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluation on synthetic and real-world data shows that DISC efﬁciently discovers meaningful components and accurately characterises these in easily understandable terms.
Researcher Affiliation	Academia	Sebastian Dalleiger, Jilles Vreeken CISPA Helmholtz Center for Information Security {sebastian.dalleiger, jv}@cispa.de
Pseudocode	Yes	Algorithm 1: DESC for Describing the Composition and Algorithm 2: DISC for Discovering the Composition
Open Source Code	Yes	We provide the source code, datasets, synthetic dataset generator, and additional information needed for reproducibility.1 and 1https://eda.mmci.uni-saarland.de/disc/
Open Datasets	Yes	We provide the source code, datasets, synthetic dataset generator, and additional information needed for reproducibility.1 and 1https://eda.mmci.uni-saarland.de/disc/
Dataset Splits	No	The paper evaluates on synthetic and real-world datasets but does not explicitly provide details about train/validation/test splits (e.g., percentages, sample counts, or specific split methodologies) for reproduction.
Hardware Specification	Yes	We implemented DISC in C++ , ran experiments on a 12-Core Intel Xeon E5-2643 CPU, and report wall-clock time.
Software Dependencies	No	The paper states 'We implemented DISC in C++' but does not provide specific version numbers for key software components, libraries, or solvers.
Experiment Setup	Yes	In all experiments we have used the same signiﬁcance level α = 0.01. and Since DBSCAN relies on hyper-parameter, we optimize ℓ using a grid-search over 7 ϵ-candidates and we do not constraint cluster-sizes.