Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Conditional Independence Testing using Generative Adversarial Networks
Authors: Alexis Bellot, Mihaela van der Schaar
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using synthetic simulations with high-dimensional data we demonstrate significant gains in power over competing methods. In addition, we illustrate the use of our test to discover causal markers of disease in genetic data. Sections 4 and 5 provide experiments on synthetic and real data respectively |
| Researcher Affiliation | Academia | Alexis Bellot1,2 Mihaela van der Schaar1,2,3 1University of Cambridge, 2The Alan Turing Institute, 3University of California Los Angeles |
| Pseudocode | Yes | Pseudo-code for the GCIT and full details on the implementation are given in Supplement D. |
| Open Source Code | Yes | An implementation of our test and tutorial are available at https://bitbucket.org/mvdschaar/mlforhealthlabpub/src/master/alg/gcit/. |
| Open Datasets | Yes | We use the subset of the CCLE data [1] relating to the drug PLX4720; it contains 474 cancer cell lines described by 466 genetic mutations. [1] Jordi Barretina, Giordano Caponigro, Nicolas Stransky, Kavitha Venkatesan, Adam A Margolin, Sungjoon Kim, Christopher J Wilson, Joseph Lehár, Gregory V Kryukov, Dmitriy Sonkin, et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature, 483(7391):603, 2012. |
| Dataset Splits | No | The paper does not provide specific details about training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) for the datasets used in its experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. It mentions synthetic simulations and experiments but no machine specifications. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. It mentions the use of "Generative Adversarial Networks" and "Energy-based generative neural networks" but no software versions like Python or PyTorch versions. |
| Experiment Setup | Yes | In practice, there will be a trade-off between the objectives of the discriminator and information network but we found that setting λ = 10 in our experiments achieved good performance. |