reproducibilityindex.ai

Integrating Overlapping Datasets Using Bivariate Causal Discovery

Authors: Anish Dhir, Ciaran M. Lee3781-3790

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A robust comparison between our approach and the IOD algorithm on a range of synthetic and real world data. These cover the regimes of low overlap and low number of variables, low overlap and high number of variables, and high overlap and high number of variables. We also include a performance comparison of the algorithms as a function of the number of overlapping variables.
Researcher Affiliation	Collaboration	Anish Dhir,1 Ciar an M. Lee1,2 1Babylon Health, London, United Kingdom 2University College London, United Kingdom {anish.dhir, ciaran.lee}@babylonhealth.com
Pseudocode	Yes	Algorithm 1 Input: Overlapping datasets {D1, . . . , Dn}, IOD algorithm, bivariate causal discovery algorithm C. Output: Consistent joint causal structures. 1: Apply part 1 of the IOD algorithm to return graphs G, G1, . . . Gn.
Open Source Code	No	The paper does not contain any statement about releasing source code or provide a link to a code repository.
Open Datasets	Yes	Next, the algorithms are compared on the Sachs et al. protein dataset. The ground truth graph was taken from (Sachs and et al. 2005). ... We test on Breast Cancer data,3 containing 10 features that describe the cell nucleus present in an image of a breast mass. ... 3https://archive.ics.uci.edu/ml/datasets/Breast+Cancer +Wisconsin+(Diagnostic)
Dataset Splits	No	The paper mentions 'Sample sizes of 3000 were used' for synthetic experiments and discusses 'Analysis of effect of sample size on Causal IOD', but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) for its experiments.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions statistical tests like 'HSIC (Gretton et al. 2008)' and 'KCIT (Zhang et al. 2011)' and describes kernels used, but it does not provide specific version numbers for any software libraries, packages, or programming languages used in the experiments.
Experiment Setup	No	The paper states that 'Sample sizes of 3000 were used' for synthetic experiments and refers to 'functional relationships' in an appendix, but it does not provide specific hyperparameters, optimizer settings, or other detailed training configurations within the main text or its appendices relevant to the experimental setup.