Projecting Markov Random Field Parameters for Fast Mixing

Authors: Xianghang Liu, Justin Domke

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments below take two stages: first, the parameters are projected (in some divergence) and then we compare the accuracy of sampling with the resulting marginals. We focus on this second aspect. However, we provide a comparison of the computation time for various projection algorithms in Table 1, and when comparing the accuracy of sampling with a given amount of time, provide two curves for sampling with the original parameters, where one curve has an extra amount of sampling effort roughly approximating the time to perform projection in the reversed KL divergence.
Researcher Affiliation Academia Xianghang Liu NICTA, The University of New South Wales xianghang.liu@nicta.com.au Justin Domke NICTA, The Australian National University justin.domke@nicta.com.au
Pseudocode Yes Algorithm 1 Projected gradient descent for divergence projection
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper refers to "randomly generated MRF models" and the "Berkeley segmentation dataset" but does not provide specific access information (link, DOI, formal citation with authors/year) for a publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification Yes All results use a single core of a Intel i7 860 processor.
Software Dependencies No The paper mentions software like LBFGS-B, but does not provide specific version numbers for software dependencies.
Experiment Setup Yes Except where otherwise stated, parameters are projected onto the ball {θ : R(θ) c}, where c = 2.5 is larger than the value of c = 1 suggested by the proofs above. For piecewise projection, grids use simple vertical and horizontal chains of treewidth either one or two. For random graphs, we randomly generate spanning trees until all edges are covered. Gradient descent uses a fixed step size of λ = 0.1. A Gibbs step is one systematic-scan pass over all variables between. The reversed KL divergence maintains a pool of 500 samples, each of which is updated by a single Gibbs step in each iteration.