Implicit Causal Models for Genome-wide Association Studies
Authors: Dustin Tran, David M. Blei
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we scale Bayesian inference on up to a billion genetic measurements. We achieve state of the art accuracy for identifying causal factors: we significantly outperform existing genetics methods by an absolute difference of 15-45.3%. |
| Researcher Affiliation | -1 | Anonymous authors Paper under double-blind review |
| Pseudocode | No | The paper provides an example implementation in Python code in Appendix B, but it does not present structured pseudocode or a clearly labeled algorithm block in the main text. |
| Open Source Code | No | The paper is under double-blind review and mentions an example implementation in Edward (a third-party library) in Appendix B, but does not explicitly state that the authors' own code for the described methodology is open-source or provide a link to a repository. |
| Open Datasets | Yes | Following Hao et al. (2016), we use the Balding-Nichols model based on the Hap Map dataset (Balding & Nichols, 1995; Gibbs et al., 2003); PCA based on the 1000 Genomes Project (TGP) (Consortium et al., 2010); PCA based on the Human Genome Diversity project (HGDP) (Rosenberg et al., 2002); four variations of the Pritchard-Stephens-Donnelly model (PSD) based on HGDP (Pritchard et al., 2000); and four variations of a configuration where population structure is determined by a latent spatial position of individuals. |
| Dataset Splits | No | The paper describes a simulation study where data is generated and then evaluated, but it does not specify explicit training, validation, and test splits for a pre-existing dataset. The validation is implicitly done through the simulation itself, comparing performance on generated data. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. It only mentions general training parameters. |
| Software Dependencies | No | The paper mentions using the "Edward probabilistic programming language (Tran et al., 2016)", but it does not specify a version number for Edward or any other key software components. |
| Experiment Setup | Yes | In all experiments, we use Adam with a initial step-size of 0.005, initialize neural network parameters uniformly with He variance scaling (He et al., 2015), and specify the neural networks for traits and SNPs as fully connected with two hidden layers, Re LU activation, and batch normalization (hidden layer sizes described below). ... We set the latent dimension of confounders to be 6 (following Song et al. (2015)). We use 512 units in both hidden layers of the SNP neural network and use 32 and 256 units for the trait neural network s first and second hidden layers respectively. |