Integrating Overlapping Datasets Using Bivariate Causal Discovery
Authors: Anish Dhir, Ciaran M. Lee3781-3790
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A robust comparison between our approach and the IOD algorithm on a range of synthetic and real world data. These cover the regimes of low overlap and low number of variables, low overlap and high number of variables, and high overlap and high number of variables. We also include a performance comparison of the algorithms as a function of the number of overlapping variables. |
| Researcher Affiliation | Collaboration | Anish Dhir,1 Ciar an M. Lee1,2 1Babylon Health, London, United Kingdom 2University College London, United Kingdom {anish.dhir, ciaran.lee}@babylonhealth.com |
| Pseudocode | Yes | Algorithm 1 Input: Overlapping datasets {D1, . . . , Dn}, IOD algorithm, bivariate causal discovery algorithm C. Output: Consistent joint causal structures. 1: Apply part 1 of the IOD algorithm to return graphs G, G1, . . . Gn. |
| Open Source Code | No | The paper does not contain any statement about releasing source code or provide a link to a code repository. |
| Open Datasets | Yes | Next, the algorithms are compared on the Sachs et al. protein dataset. The ground truth graph was taken from (Sachs and et al. 2005). ... We test on Breast Cancer data,3 containing 10 features that describe the cell nucleus present in an image of a breast mass. ... 3https://archive.ics.uci.edu/ml/datasets/Breast+Cancer +Wisconsin+(Diagnostic) |
| Dataset Splits | No | The paper mentions 'Sample sizes of 3000 were used' for synthetic experiments and discusses 'Analysis of effect of sample size on Causal IOD', but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts) for its experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions statistical tests like 'HSIC (Gretton et al. 2008)' and 'KCIT (Zhang et al. 2011)' and describes kernels used, but it does not provide specific version numbers for any software libraries, packages, or programming languages used in the experiments. |
| Experiment Setup | No | The paper states that 'Sample sizes of 3000 were used' for synthetic experiments and refers to 'functional relationships' in an appendix, but it does not provide specific hyperparameters, optimizer settings, or other detailed training configurations within the main text or its appendices relevant to the experimental setup. |