Identifying Selection Bias from Observational Data

Authors: David Kaltenpoth, Jilles Vreeken

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive evaluation on synthetic and real-world data, we verify that our methods beat the state of the art both in detecting as well as characterizing selection bias.
Researcher Affiliation Academia CISPA Helmholtz Center for Information Security, Germany david.kaltenpoth@cispa.de, vreeken@cispa.de
Pseudocode No No structured pseudocode or algorithm blocks labeled 'Algorithm' or 'Pseudocode' were found.
Open Source Code Yes We make all code and data available online!1
Open Datasets Yes Palmer Penguins dataset [Gorman, Williams, and Fraser 2014], Open Exoplanet Catalogue using the Exo Data library [Rein 2012, Varley 2016].
Dataset Splits No The paper describes how synthetic data is generated and how real data is preprocessed (e.g., 'We split the data by penguin species and then for each of them we select the 80% of penguins with the lowest weight from that species.'), but it does not specify explicit train/validation/test splits for evaluating the proposed methods.
Hardware Specification No All experiments finished within a few hours on a commodity laptop.
Software Dependencies No We implement our methods in Python using Tensorflow [Abadi et al. 2016].
Experiment Setup No The paper describes the general methodology for EXP and INV, but it does not provide specific hyperparameter values (e.g., learning rates, batch sizes, number of epochs, specific optimizer settings) used in the experiments.