Wormhole Hamiltonian Monte Carlo

Authors: Shiwei Lan, Jeffrey Streets, Babak Shahbaba

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the performance of our method, henceforth called Wormhole Hamiltonian Monte Carlo (WHMC), using three examples. The first example involves sampling from mixtures of Gaussian distributions with varying number of modes and dimensions. In this example, which is also discussed by (Ahn, Chen, and Welling 2013), the locations of modes are assumed to be known. The second example, which was originally proposed by (Ihler et al. 2005), involves inference regarding the locations of sensors in a network. For our third example, we also use mixtures of Gaussian distributions, but this time we assume that the locations of modes are unknown. We evaluate our method s performance by comparing it to Regeneration Darting Monte Carlo (RDMC) (Ahn, Chen, and Welling 2013), which is one of the most recent algorithms designed for sampling from multimodal distributions based on the Darting Monte Carlo (DMC) (Sminchisescu and Welling 2011) approach. DMC defines high density regions around the modes. When the sampler enters these regions, a jump between the regions will be attempted. RDMC enriches the DMC method by using the regeneration approach (Mykland, Tierney, and Yu 1995; Gilks, Roberts, and Sahu 1998). However, these methods tend to fail in high dimensional spaces where modes are isolated, small and hard to hit. We compare the two methods (i.e., WHMC and RDMC) in terms of Relative Error of Mean (REM) proposed by (Ahn, Chen, and Welling 2013). The value of REM at time t is REM(t) = θ(t) θ 1/ θ 1, which summarizes the error in approximating the expectation of variables across all dimensions. Here, θ is the true mean and θ(t) is the mean estimated by MCMC samples collected up to time t. We examine REM(t) as a function of t until a pre-specified time representing a given computational budget (Ahn, Chen, and Welling 2013; Anoop Korattikara 2014).
Researcher Affiliation Academia Shiwei Lan Department of Statistics University of California, Irvine Jeffrey Streets Department of Mathematics University of California, Irvine Babak Shahbaba Department of Statistics University of California, Irvine
Pseudocode Yes The attached supplementary file provides the details of our algorithm (Algorithm 1), along with the proof of convergence and its implementation in MATLAB. Algorithm 2 in the supplementary file shows the steps for this method.
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for their method.
Open Datasets No The paper describes generating data for the Mixture of Gaussians example: "The means of these distributions are randomly generated from D-dimensional uniform distributions such that the average pairwise distances remains around 20. The corresponding covariance matrices are constructed in a way that mixture components have different density functions." For the sensor network localization, it refers to prior work but does not provide direct access or a formal citation for a publicly available dataset used in their experiments.
Dataset Splits No The paper describes the setup for its examples (e.g., generating 1000 data points for one example) but does not specify any train, validation, or test dataset splits. It does not mention cross-validation or specific splitting methodologies.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, cloud instances) used to run the experiments.
Software Dependencies No The paper mentions "implementation in MATLAB" but does not specify a version number for MATLAB or any other software libraries or dependencies used, nor their versions.
Experiment Setup Yes For this example, we set G0 = I to make WHMC comparable to standard HMC. Further, we use 0.03 and 0.3 for ε and F respectively. While RDMC runs four parallel HMC chains initially to discover a subset of modes and to fit a truncated Gaussian distribution around each identified mode, we run four parallel optimizers (different starting points) using the BFGS method. At regeneration times, each chain of RDMC uses the Dirichlet process mixture model to fit a new truncated Gaussian around modes and possibly identify new modes. We on the other hand run the BGFS algorithm based on the residual energy function (with T = 1.05) to discover new modes for each chain.