Learning Exceptional Subgroups by End-to-End Maximizing KL-Divergence

Authors: Sascha Xu, Nils Philipp Walter, Janis Kalofolias, Jilles Vreeken

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate SYFLOW on synthetic and realworld data. We show that SYFLOW accurately and reliably learns and characterizes exceptional subgroups, even for complex target distributions.
Researcher Affiliation Academia 1CISPA Helmholtz Center for Information Security, Saarbr ucken, Germany. Correspondence to: Sascha Xu <sascha.xu@cispa.de>, Nils P. Walter <nils.walter@cispa.de>.
Pseudocode Yes We provide a diagram overviewing and the pseudo-code for SYFLOW in the Appendix C. Appendix C: Algorithm 1: fit flow and Algorithm 2: SYFLOW
Open Source Code Yes We give the hyperparameters in Appendix D and provide the data generators as well as the code online.1 1https://eda.rg.cispa.io/prj/syflow/
Open Datasets Yes We now turn to real-world data, where we evaluate on regression datasets from the UCI-Machine Learning Repository.2 2https://archive.ics.uci.edu. We conduct a preliminary experiment on the MNIST dataset (Le Cun et al., 1998) using the digits 0 and 1 only.
Dataset Splits No The paper mentions running experiments multiple times (e.g., 'We run each experiment five times and report the average' and 'For SYFLOW we report the average over 100 runs and report the standard deviation in Tab. 3'), but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed for reproducibility.
Experiment Setup Yes For SYFLOW the hyperparameter setting is: t = 0.2, γ = 0.5, λ = 0.5, lr Flow = 5 10 2, lrs = 2 10 2, epochs Flow Y = 2000 and epochs Flow Ys = 1500. For SD-µ, SD-KL and RSD, we used 20 cutpoints and a beamwidth of 100, while γ is set to 1.0. For the experiments on Kaggle and UCI data (i.e. Section 5.2), we used for SYFLOW: t = 0.2, γ = 0.3, λ = 2.0, lr Flow = 5 10 2, lrs = 2 10 2, epochs Flow Y = 1000 and epochs Flow Ys = 1000.