Gradient-based Adaptive Markov Chain Monte Carlo

Authors: Michalis Titsias, Petros Dellaportas

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test the gradient-based adaptive MCMC methods in several simulated and real data. We investigate the performance of two instances of the framework: the gradient-based adaptive random walk (gad RWM) detailed in Section 2.2 and the corresponding MALA (gad MALA) detailed in Section 2.3. For gad MALA we consider the exact reparametrisation method that requires the evaluation of the Hessian (gad MALAe) and the fast approximate variant (gad MALAf). These schemes are compared against: (i) standard random walk Metropolis (RWM) with proposal N(y|x, σ2I), (ii) an adaptive MCMC (AM) that fits a proposal of the form N(y|x, Σ) (we consider a computational efficient version based on updating the Cholesky factor of Σ; see Supplement), (iii) a standard MALA proposal N(y|x + (1/2)σ2 log π(x), σ2I), (iv) an Hamiltonian Monte Carlo (HMC) with a fixed number of leap frog steps (either 5, or 10, or 20) (v) and the state of the art no-U-turn sampler (NUTS) [17] which arguably is the most efficient adaptive HMC method that automatically determines the number of leap frog steps.
Researcher Affiliation Collaboration Michalis K. Titsias Deep Mind London, UK mtitsias@google.com Petros Dellaportas Department of Statistical Science University College of London, UK Department of Statistics, Athens Univ. of Econ. and Business, Greece and The Alan Turing Institute, UK
Pseudocode Yes Algorithm 1 Gradient-based Adaptive MCMC
Open Source Code Yes We provide our own MALTAB implementation2 of all algorithms, apart from NUTS for which we consider a publicly available implementation." Footnote 2: "https://github.com/mtitsias/gad MCMC.
Open Datasets Yes We first consider an example used in [24] where the target is a zero-mean multivariate Gaussian with diagonal covariance matrix Σ = diag(s2 1, . . . , s2 100) where the stds si take values in the linear grid 0.01, 0.02, . . . , 1." and "We considered six binary classification datasets (Australian Credit, Heart, Pima Indian, Ripley, German Credit and Caravan) with a number of examples ranging from n = 250 to n = 5822 and dimensionality of the state/parameter vector ranging from 3 to 87." Reference [24] is "Radford M. Neal. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 54:113 162, 2010."
Dataset Splits No The paper mentions "2 104 burn-in iterations and 2 104 iterations for collecting samples" for tuning and collecting, but it does not specify training, validation, or test dataset splits in terms of data partitioning. It focuses on adaptive burn-in periods.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper states "We provide our own MALTAB implementation of all algorithms", but it does not specify any version numbers for MATLAB or other software dependencies.
Experiment Setup Yes In all experiments for AM and gradient-based adaptive schemes the Cholesky factor L was initialised to a diagonal matrix with values 0.1/ n in the diagonal where n is the dimensionality of x. For the AM algorithm the learning rate was set to 0.001/(1+t/T) where t is the number of iterations and T (the value 4000 was used in all experiments) serves to keep the learning rate nearly constant for the first T training iterations. For the gradient-based adaptive algorithms we use RMSprop (see Section 2.1) where η was set to 0.00005 for gad RWM and to 0.00015 for the gad MALA schemes. NUTS uses its own fully automatic adaptive procedure that determines both the step size and the number of leap frog steps [17]. For all experiments and algorithms (apart from NUTS) we consider 2 104 burn-in iterations and 2 104 iterations for collecting samples.