reproducibilityindex.ai

Augur: Data-Parallel Probabilistic Modeling

Authors: Jean-Baptiste Tristan, Daniel Huang, Joseph Tassarotti, Adam C Pocock, Stephen Green, Guy L Steele

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide experimental results for the two examples presented throughout the paper and in the supplementary material for a Gaussian Mixture Model (GMM). To test multivariate regression and the GMM, we compare Augur’s performance to those of two popular languages for statistical modeling, JAGS [7] and Stan [8].
Researcher Affiliation	Collaboration	1Oracle Labs {jean.baptiste.tristan, adam.pocock, stephen.x.green, guy.steele}@oracle.com 2Harvard University dehuang@fas.harvard.edu 3Carnegie Mellon University jtassaro@cs.cmu.edu
Pseudocode	Yes	1 object LDA { 2 class sig(var phi: Array[Double], 3 var theta: Array[Double], 4 var z: Array[Int], 5 var w: Array[Int]) 6 val model = bayes { 7 (K:Int,V:Int,M:Int,N:Array[Int]) => { 8 val alpha = vector(K,0.1) 9 val beta = vector(V,0.1) 10 val phi = Dirichlet(V,beta).sample(K) 11 val theta = Dirichlet(K,alpha).sample(M) 12 val w = 13 for(i <1 to M) yield { 14 for(j <1 to N(i)) yield { 15 val z: Int = 16 Categorical(K,theta(i)).sample() 17 Categorical(V,phi(z)).sample() 18 }} 19 observe(w) 20 }}} (a) A LDA model in Augur. The model speciﬁes the distribution p(φ, θ, z \| w).
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for the Augur system or its methodology is publicly available.
Open Datasets	Yes	For the linear regression experiment, we used data sets from the UCI regression repository [14].
Dataset Splits	No	The paper mentions using a 'test set' for LDA but does not provide explicit details about training/validation splits (percentages, counts, or specific methods like k-fold cross-validation) for any dataset, nor does it explicitly mention a validation set.
Hardware Specification	Yes	All experiments ran on a single workstation with an Intel Core i7 4820k CPU, 32 GB RAM, and an NVIDIA Ge Force Titan Black. The Titan Black uses the Kepler architecture.
Software Dependencies	No	The paper mentions software like Scala, CUDA, JAGS, Stan, and Factorie, but does not specify their version numbers.
Experiment Setup	Yes	For the regression, we conﬁgured Augur to use MH1, while for the GMM Augur generated a Gibbs sampler. All probability values are calculated in double precision. The GPU results use all 960 double-precision ALU cores available in the Titan Black.