Parallel Affine Transformation Tuning of Markov Chain Monte Carlo
Authors: Philip Schär, Michael Habeck, Daniel Rudolf
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The results of several numerical experiments with PATT are presented in Section 5. We conclude the paper’s main body with some final remarks in Section 6. In addition, we offer complementary information on ATT and related topics in the supplementary material: Appendices A, B and C provide detailed considerations and guidelines regarding the choices of the adjustment types, transformation parameters and update schedules defined in Sections 2 and 3. Theses appendices may serve as a cookbook for implementing and/or applying ATT or PATT. In place of a related work section, we give a detailed overview of connections between our method and various others in Appendix D. In Appendix E, we prove that in certain cases a simple adaptive MCMC implementation of ATT is equivalent to other, more traditional adaptive MCMC methods, in that the respective transition kernels coincide. The proof of our theoretical result from Section 4 is provided in Appendix F. In Appendix G we elaborate on the models and hyperparameter choices for the experiments behind the results presented in Section 5, and provide some further results. Appendix H presents a series of ablation studies demonstrating that each non-essential component of PATT can, in principle, substantially improve its performance. Appendix I offers more plots illustrating the main experiments as well as the ablation studies. |
| Researcher Affiliation | Academia | 1Microscopic Image Analysis Group, Friedrich Schiller University Jena, Jena, Germany 2Faculty of Computer Science and Mathematics, University of Passau, Passau, Germany. Correspondence to: Daniel Rudolf <daniel.rudolf@uni-passau.de>. |
| Pseudocode | Yes | Algorithm 1 ATT transition Algorithm 2 PATT |
| Open Source Code | Yes | The source code for our numerical experiments is provided as a github repository6. |
| Open Datasets | Yes | For our second BLR experiment, again following Nishihara et al. (2014), we used the breast cancer Wisconsin (diagnostic) data set (Street et al., 1995)... In our third BLR experiment, we used the Pima diabetes data (Smith et al., 1988)... In our fourth and final experiment on BLR, we used the red wine quality data set (Cortez et al., 2009)... As data for the model we used a small subset of county-wise accumulations of some recent US census data, which we obtained from Kaggle11. |
| Dataset Splits | No | When numerically analyzing the sampling performance of PATT and its competitors, we were more interested in their respective long-term efficiency than in their behavior in the early stages. We therefore used a generous burn-in period, in that we considered only those samples generated in the latter half of iterations for this analysis. |
| Hardware Specification | No | In order to ensure that our experiments could be executed unaltered on a regular workstation (for the sake of good reproducibility), we ran them on such a machine ourselves. This led us to choose p := 10 (slightly less than the number of available processor cores on our machine) for each of the aforementioned methods throughout all of the experiments. |
| Software Dependencies | No | Instead we relied on the Python interface Py Stan8 of the software package Stan. |
| Experiment Setup | Yes | For PATT and the naively parallelized versions of HRUSS, Ada RWM and NUTS that we used to run these three methods, we could freely choose the number p of parallel chains maintained by each method. ... we set a parameter nits N. ... for Ada RWM, ... we set β := 0.05. ... we used σ2 = 10 2... we imposed the independent exponential prior... with fixed rate r = 0.1. |