reproducibilityindex.ai

A generative nonparametric Bayesian model for whole genomes

Authors: Alan Amin, Eli N Weinstein, Debora Marks

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We explore, theoretically and empirically, applications of BEAR models to a variety of statistical problems including density estimation, robust parameter estimation, goodness-of-ﬁt tests, and two-sample tests. On genomic, transcriptomic, and metagenomic sequence data we show that BEAR models provide large increases in predictive performance as compared to parametric autoregressive models, among other results.
Researcher Affiliation	Academia	Alan N. Aminú 1,2, Eli N. Weinsteinú 2,3 and Debora S. Marks2,4 1 Program in Systems, Synthetic and Quantitative Biology 2 Department of Systems Biology Harvard Medical School 3 Program in Biophysics, Harvard University 4 Broad Institute of Harvard and MIT
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github. com/debbiemarkslab/BEAR.
Open Datasets	Yes	We considered eleven datasets of four diﬀerent types: whole genome sequencing read data, single cell RNA sequencing read data (including from patient tumors), metagenomic sequencing read data (including from patient fecal samples) and full bacterial genomes from across the tree of life (Section K). [1] 1001 Genomes Consortium. 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. [64] P. W. Schreiber et al. Metagenomic virome sequencing... revealed JC polyomavirus transmission.
Dataset Splits	No	The paper states '25% of data was randomly held out for testing', implying the remaining 75% for training, but it does not specify a separate validation split percentage or method.
Hardware Specification	No	The paper discusses processing 'terabyte-scale datasets' and using a 'high-performance kmer counter', but it does not provide specific details such as GPU/CPU models, memory, or cloud instance types used for the experiments.
Software Dependencies	Yes	Using a high-performance kmer counter optimized for nucleotide data, KMC, we can compute the count matrix #( , ) for all observed kmers k in terabyte-scale datasets, even when the matrix does not ﬁt in main memory (Section J.2) [39]. [39] M. Kokot, M. Dlugosz, and S. Deorowicz. KMC 3: counting and manipulating k-mer statistics.
Experiment Setup	No	The paper mentions 'empirical Bayes methods that optimize point estimates of L, h and ' and 'standard stochastic gradient-based optimization', but it does not provide specific hyperparameter values like learning rate, batch size, or number of epochs.