Learning Exceptional Subgroups by End-to-End Maximizing KL-Divergence
Authors: Sascha Xu, Nils Philipp Walter, Janis Kalofolias, Jilles Vreeken
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate SYFLOW on synthetic and realworld data. We show that SYFLOW accurately and reliably learns and characterizes exceptional subgroups, even for complex target distributions. |
| Researcher Affiliation | Academia | 1CISPA Helmholtz Center for Information Security, Saarbr ucken, Germany. Correspondence to: Sascha Xu <sascha.xu@cispa.de>, Nils P. Walter <nils.walter@cispa.de>. |
| Pseudocode | Yes | We provide a diagram overviewing and the pseudo-code for SYFLOW in the Appendix C. Appendix C: Algorithm 1: fit flow and Algorithm 2: SYFLOW |
| Open Source Code | Yes | We give the hyperparameters in Appendix D and provide the data generators as well as the code online.1 1https://eda.rg.cispa.io/prj/syflow/ |
| Open Datasets | Yes | We now turn to real-world data, where we evaluate on regression datasets from the UCI-Machine Learning Repository.2 2https://archive.ics.uci.edu. We conduct a preliminary experiment on the MNIST dataset (Le Cun et al., 1998) using the digits 0 and 1 only. |
| Dataset Splits | No | The paper mentions running experiments multiple times (e.g., 'We run each experiment five times and report the average' and 'For SYFLOW we report the average over 100 runs and report the standard deviation in Tab. 3'), but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed for reproducibility. |
| Experiment Setup | Yes | For SYFLOW the hyperparameter setting is: t = 0.2, γ = 0.5, λ = 0.5, lr Flow = 5 10 2, lrs = 2 10 2, epochs Flow Y = 2000 and epochs Flow Ys = 1500. For SD-µ, SD-KL and RSD, we used 20 cutpoints and a beamwidth of 100, while γ is set to 1.0. For the experiments on Kaggle and UCI data (i.e. Section 5.2), we used for SYFLOW: t = 0.2, γ = 0.3, λ = 2.0, lr Flow = 5 10 2, lrs = 2 10 2, epochs Flow Y = 1000 and epochs Flow Ys = 1000. |