Scalable Gaussian Process Structured Prediction for Grid Factor Graph Applications
Authors: Sebastien Bratieres, Novi Quadrianto, Sebastian Nowozin, Zoubin Ghahramani
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here we explore a scalable approach to learning GPstruct models based on ensemble learning, with weak learners (predictors) trained on subsets of the latent variables and bootstrap data, which can easily be distributed. We show experiments with 4M latent variables on image segmentation. Our method outperforms widely-used conditional random field models trained with pseudo-likelihood. Moreover, in image segmentation problems it improves over recent state-of-the-art marginal optimisation methods in terms of predictive performance and uncertainty calibration. Finally, it generalises well on all training set sizes. |
| Researcher Affiliation | Collaboration | S ebastien Brati eres2 SEBASTIEN@CANTAB.NET Novi Quadrianto1,2 N.QUADRIANTO@SUSSEX.AC.UK Sebastian Nowozin3 SEBASTIAN.MICROSOFT.COM Zoubin Ghahramani2 ZOUBIN@ENG.CAM.AC.UK 1SMi Le CLi Ni C, Department of Informatics, University of Sussex, UK 2Machine Learning Group, Department of Engineering, University of Cambridge, UK 3Microsoft Research, Cambridge, UK |
| Pseudocode | Yes | 1. (Distributed stage) For each weak learner t = 1, . . . , T (a) Generate bootstrap data Dt = {(xt,1, yt,1), . . . , (xt,N, yt,N)} from the empirical distribution ˆPr(x, y) = 1 N PN n=1 δ(x xn)δ(y yn). (b) (Training) Based on the subset of pixel positions Vt V on images Dt, perform training by ESS using PL on Vt as the likelihood, resulting in MCMC samples Et. (c) (Partial prediction) For each sample Et, obtain samples E t from the MVG E t |Et. For each sample E t obtain the predictive distribution Pr(y |x , E t ) using TRW. Aggregate these in Pr(y |x , Dt) 1 S Pr(y |x , E t ), where S is the number of samples E t . 2. (Aggregation stage) Compute the complete predictive distribution Pr T (y |x ) as a uniform average of the Pr(y |x , Dt), t. |
| Open Source Code | No | The code will be available as a GPstruct toolbox at the authors homepage. This statement indicates future availability, not current concrete access. |
| Open Datasets | Yes | Stanford Background Dataset (Gould et al., 2009) This dataset consists of 715 images of different sizes, resized to 50 150 pixels. Each pixel in the image is labelled with one of 8 classes, i.e. {sky, tree, road, grass, water, building, mountain, foreground object}. Label Me Facade Image Database (Fr ohlich et al., 2010) Our second dataset contains 100 images for training and 845 images for testing. The images are of different sizes and are resized to 50 150 pixels. |
| Dataset Splits | No | The paper describes splitting data into training and test sets but does not explicitly mention a separate validation set or details for a validation split. |
| Hardware Specification | No | The paper states: "Computations for GPstruct were distributed on an Amazon cluster using MIT s Starcluster5." This mentions a cluster and cloud service but does not provide specific hardware details like GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper mentions "Our implementation was done in Matlab" and "we use J. Domke s toolbox" but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We train these CRF-based models with a regularisation parameter of 10 4 as in Domke (2013). We use a squared exponential kernel between the pixels4, i.e. k(xi, xj) = exp( γ xi xj 2). The kernel width γ is set to 1/(number of features). We train 50 weak learners in total. Each weak learner is trained on 5000 pixel positions uniformly chosen in the training set images. |