Augur: Data-Parallel Probabilistic Modeling
Authors: Jean-Baptiste Tristan, Daniel Huang, Joseph Tassarotti, Adam C Pocock, Stephen Green, Guy L Steele
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide experimental results for the two examples presented throughout the paper and in the supplementary material for a Gaussian Mixture Model (GMM). To test multivariate regression and the GMM, we compare Augur’s performance to those of two popular languages for statistical modeling, JAGS [7] and Stan [8]. |
| Researcher Affiliation | Collaboration | 1Oracle Labs {jean.baptiste.tristan, adam.pocock, stephen.x.green, guy.steele}@oracle.com 2Harvard University dehuang@fas.harvard.edu 3Carnegie Mellon University jtassaro@cs.cmu.edu |
| Pseudocode | Yes | 1 object LDA { 2 class sig(var phi: Array[Double], 3 var theta: Array[Double], 4 var z: Array[Int], 5 var w: Array[Int]) 6 val model = bayes { 7 (K:Int,V:Int,M:Int,N:Array[Int]) => { 8 val alpha = vector(K,0.1) 9 val beta = vector(V,0.1) 10 val phi = Dirichlet(V,beta).sample(K) 11 val theta = Dirichlet(K,alpha).sample(M) 12 val w = 13 for(i <1 to M) yield { 14 for(j <1 to N(i)) yield { 15 val z: Int = 16 Categorical(K,theta(i)).sample() 17 Categorical(V,phi(z)).sample() 18 }} 19 observe(w) 20 }}} (a) A LDA model in Augur. The model specifies the distribution p(φ, θ, z | w). |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the Augur system or its methodology is publicly available. |
| Open Datasets | Yes | For the linear regression experiment, we used data sets from the UCI regression repository [14]. |
| Dataset Splits | No | The paper mentions using a 'test set' for LDA but does not provide explicit details about training/validation splits (percentages, counts, or specific methods like k-fold cross-validation) for any dataset, nor does it explicitly mention a validation set. |
| Hardware Specification | Yes | All experiments ran on a single workstation with an Intel Core i7 4820k CPU, 32 GB RAM, and an NVIDIA Ge Force Titan Black. The Titan Black uses the Kepler architecture. |
| Software Dependencies | No | The paper mentions software like Scala, CUDA, JAGS, Stan, and Factorie, but does not specify their version numbers. |
| Experiment Setup | Yes | For the regression, we configured Augur to use MH1, while for the GMM Augur generated a Gibbs sampler. All probability values are calculated in double precision. The GPU results use all 960 double-precision ALU cores available in the Titan Black. |