Integrating Overlapping Datasets Using Bivariate Causal Discovery

Authors: Anish Dhir, Ciaran M. Lee3781-3790

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A robust comparison between our approach and the IOD algorithm on a range of synthetic and real world data. These cover the regimes of low overlap and low number of variables, low overlap and high number of variables, and high overlap and high number of variables. We also include a performance comparison of the algorithms as a function of the number of overlapping variables.
Researcher Affiliation Collaboration Anish Dhir,1 Ciar an M. Lee1,2 1Babylon Health, London, United Kingdom 2University College London, United Kingdom {anish.dhir, ciaran.lee}@babylonhealth.com
Pseudocode Yes Algorithm 1 Input: Overlapping datasets {D1, . . . , Dn}, IOD algorithm, bivariate causal discovery algorithm C. Output: Consistent joint causal structures. 1: Apply part 1 of the IOD algorithm to return graphs G, G1, . . . Gn.
Open Source Code No The paper does not contain any statement about releasing source code or provide a link to a code repository.
Open Datasets Yes Next, the algorithms are compared on the Sachs et al. protein dataset. The ground truth graph was taken from (Sachs and et al. 2005). ... We test on Breast Cancer data,3 containing 10 features that describe the cell nucleus present in an image of a breast mass. ... 3https://archive.ics.uci.edu/ml/datasets/Breast+Cancer +Wisconsin+(Diagnostic)
Dataset Splits No The paper mentions 'Sample sizes of 3000 were used' for synthetic experiments and discusses 'Analysis of effect of sample size on Causal IOD', but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) for its experiments.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions statistical tests like 'HSIC (Gretton et al. 2008)' and 'KCIT (Zhang et al. 2011)' and describes kernels used, but it does not provide specific version numbers for any software libraries, packages, or programming languages used in the experiments.
Experiment Setup No The paper states that 'Sample sizes of 3000 were used' for synthetic experiments and refers to 'functional relationships' in an appendix, but it does not provide specific hyperparameters, optimizer settings, or other detailed training configurations within the main text or its appendices relevant to the experimental setup.