Augur: Data-Parallel Probabilistic Modeling

Authors: Jean-Baptiste Tristan, Daniel Huang, Joseph Tassarotti, Adam C Pocock, Stephen Green, Guy L Steele

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide experimental results for the two examples presented throughout the paper and in the supplementary material for a Gaussian Mixture Model (GMM). To test multivariate regression and the GMM, we compare Augur’s performance to those of two popular languages for statistical modeling, JAGS [7] and Stan [8].
Researcher Affiliation Collaboration 1Oracle Labs {jean.baptiste.tristan, adam.pocock, stephen.x.green, guy.steele}@oracle.com 2Harvard University dehuang@fas.harvard.edu 3Carnegie Mellon University jtassaro@cs.cmu.edu
Pseudocode Yes 1 object LDA { 2 class sig(var phi: Array[Double], 3 var theta: Array[Double], 4 var z: Array[Int], 5 var w: Array[Int]) 6 val model = bayes { 7 (K:Int,V:Int,M:Int,N:Array[Int]) => { 8 val alpha = vector(K,0.1) 9 val beta = vector(V,0.1) 10 val phi = Dirichlet(V,beta).sample(K) 11 val theta = Dirichlet(K,alpha).sample(M) 12 val w = 13 for(i <1 to M) yield { 14 for(j <1 to N(i)) yield { 15 val z: Int = 16 Categorical(K,theta(i)).sample() 17 Categorical(V,phi(z)).sample() 18 }} 19 observe(w) 20 }}} (a) A LDA model in Augur. The model specifies the distribution p(φ, θ, z | w).
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the Augur system or its methodology is publicly available.
Open Datasets Yes For the linear regression experiment, we used data sets from the UCI regression repository [14].
Dataset Splits No The paper mentions using a 'test set' for LDA but does not provide explicit details about training/validation splits (percentages, counts, or specific methods like k-fold cross-validation) for any dataset, nor does it explicitly mention a validation set.
Hardware Specification Yes All experiments ran on a single workstation with an Intel Core i7 4820k CPU, 32 GB RAM, and an NVIDIA Ge Force Titan Black. The Titan Black uses the Kepler architecture.
Software Dependencies No The paper mentions software like Scala, CUDA, JAGS, Stan, and Factorie, but does not specify their version numbers.
Experiment Setup Yes For the regression, we configured Augur to use MH1, while for the GMM Augur generated a Gibbs sampler. All probability values are calculated in double precision. The GPU results use all 960 double-precision ALU cores available in the Titan Black.