Identifying Selection Bias from Observational Data
Authors: David Kaltenpoth, Jilles Vreeken
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive evaluation on synthetic and real-world data, we verify that our methods beat the state of the art both in detecting as well as characterizing selection bias. |
| Researcher Affiliation | Academia | CISPA Helmholtz Center for Information Security, Germany david.kaltenpoth@cispa.de, vreeken@cispa.de |
| Pseudocode | No | No structured pseudocode or algorithm blocks labeled 'Algorithm' or 'Pseudocode' were found. |
| Open Source Code | Yes | We make all code and data available online!1 |
| Open Datasets | Yes | Palmer Penguins dataset [Gorman, Williams, and Fraser 2014], Open Exoplanet Catalogue using the Exo Data library [Rein 2012, Varley 2016]. |
| Dataset Splits | No | The paper describes how synthetic data is generated and how real data is preprocessed (e.g., 'We split the data by penguin species and then for each of them we select the 80% of penguins with the lowest weight from that species.'), but it does not specify explicit train/validation/test splits for evaluating the proposed methods. |
| Hardware Specification | No | All experiments finished within a few hours on a commodity laptop. |
| Software Dependencies | No | We implement our methods in Python using Tensorflow [Abadi et al. 2016]. |
| Experiment Setup | No | The paper describes the general methodology for EXP and INV, but it does not provide specific hyperparameter values (e.g., learning rates, batch sizes, number of epochs, specific optimizer settings) used in the experiments. |