Non-reversible Parallel Tempering for Deep Posterior Approximation
Authors: Wei Deng, Qian Zhang, Qi Feng, Faming Liang, Guang Lin
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments Simulations of Multi-Modal Distributions We first simulate the proposed algorithm on a distribution π(β) exp( U(β)), where β = (β1, β2), U(β) = 0.2(β2 1 + β2 2) 2(cos(2πβ1) + cos(2πβ2)). The heat map is shown in Figure 3(a) with 25 modes of different volumes. To mimic big data scenarios, we can only access stochastic gradient e U(β) = U(β) + 2N(0, I2 2) and stochastic energy e U(β) = U(β) + 2N(0, I). |
| Researcher Affiliation | Collaboration | 1 Purdue University, West Lafayette, IN 2 Morgan Stanley, New York, NY 3 University of Michigan, Ann Arbor, MI |
| Pseudocode | Yes | Algorithm 1: Non-reversible parallel tempering with SGD-based exploration kernels (DEO -SGD). |
| Open Source Code | Yes | Code The code is released to https://github.com/WayneDW/Non-reversible-Parallel-Tempering-for-Deep-Posterior-Approximation for reproduction. |
| Open Datasets | Yes | We choose Res Net20, Res Net32, and Res Net56 and train the models on CIFAR100. |
| Dataset Splits | No | The paper mentions training models on CIFAR100 but does not specify any explicit train/validation/test dataset splits, percentages, or methodology for splitting the data. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | We first run DEO -SGD P16 based on 16 chains and 20,000 iterations. We fix the lowest learning rate 0.003 and the highest learning 0.6 and propose to tune the target swap rate S for the acceleration-accuracy trade-off. ... For each model, we first pre-train 10 fixed models via 300 epochs and then run algorithms based on momentum SGD (m SGD) for 500 epochs with 10 parallel chains... We fix the lowest and highest learning rates as 0.005 and 0.02, respectively. |