reproducibilityindex.ai

Learning Exceptional Subgroups by End-to-End Maximizing KL-Divergence

Authors: Sascha Xu, Nils Philipp Walter, Janis Kalofolias, Jilles Vreeken

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively evaluate SYFLOW on synthetic and realworld data. We show that SYFLOW accurately and reliably learns and characterizes exceptional subgroups, even for complex target distributions.
Researcher Affiliation	Academia	1CISPA Helmholtz Center for Information Security, Saarbr ucken, Germany. Correspondence to: Sascha Xu <sascha.xu@cispa.de>, Nils P. Walter <nils.walter@cispa.de>.
Pseudocode	Yes	We provide a diagram overviewing and the pseudo-code for SYFLOW in the Appendix C. Appendix C: Algorithm 1: fit flow and Algorithm 2: SYFLOW
Open Source Code	Yes	We give the hyperparameters in Appendix D and provide the data generators as well as the code online.1 1https://eda.rg.cispa.io/prj/syflow/
Open Datasets	Yes	We now turn to real-world data, where we evaluate on regression datasets from the UCI-Machine Learning Repository.2 2https://archive.ics.uci.edu. We conduct a preliminary experiment on the MNIST dataset (Le Cun et al., 1998) using the digits 0 and 1 only.
Dataset Splits	No	The paper mentions running experiments multiple times (e.g., 'We run each experiment five times and report the average' and 'For SYFLOW we report the average over 100 runs and report the standard deviation in Tab. 3'), but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed for reproducibility.
Experiment Setup	Yes	For SYFLOW the hyperparameter setting is: t = 0.2, γ = 0.5, λ = 0.5, lr Flow = 5 10 2, lrs = 2 10 2, epochs Flow Y = 2000 and epochs Flow Ys = 1500. For SD-µ, SD-KL and RSD, we used 20 cutpoints and a beamwidth of 100, while γ is set to 1.0. For the experiments on Kaggle and UCI data (i.e. Section 5.2), we used for SYFLOW: t = 0.2, γ = 0.3, λ = 2.0, lr Flow = 5 10 2, lrs = 2 10 2, epochs Flow Y = 1000 and epochs Flow Ys = 1000.