reproducibilityindex.ai

Identifying Selection Bias from Observational Data

Authors: David Kaltenpoth, Jilles Vreeken

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive evaluation on synthetic and real-world data, we verify that our methods beat the state of the art both in detecting as well as characterizing selection bias.
Researcher Affiliation	Academia	CISPA Helmholtz Center for Information Security, Germany david.kaltenpoth@cispa.de, vreeken@cispa.de
Pseudocode	No	No structured pseudocode or algorithm blocks labeled 'Algorithm' or 'Pseudocode' were found.
Open Source Code	Yes	We make all code and data available online!1
Open Datasets	Yes	Palmer Penguins dataset [Gorman, Williams, and Fraser 2014], Open Exoplanet Catalogue using the Exo Data library [Rein 2012, Varley 2016].
Dataset Splits	No	The paper describes how synthetic data is generated and how real data is preprocessed (e.g., 'We split the data by penguin species and then for each of them we select the 80% of penguins with the lowest weight from that species.'), but it does not specify explicit train/validation/test splits for evaluating the proposed methods.
Hardware Specification	No	All experiments finished within a few hours on a commodity laptop.
Software Dependencies	No	We implement our methods in Python using Tensorflow [Abadi et al. 2016].
Experiment Setup	No	The paper describes the general methodology for EXP and INV, but it does not provide specific hyperparameter values (e.g., learning rates, batch sizes, number of epochs, specific optimizer settings) used in the experiments.