Non-reversible Parallel Tempering for Deep Posterior Approximation

Authors: Wei Deng, Qian Zhang, Qi Feng, Faming Liang, Guang Lin

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments Simulations of Multi-Modal Distributions We first simulate the proposed algorithm on a distribution π(β) exp( U(β)), where β = (β1, β2), U(β) = 0.2(β2 1 + β2 2) 2(cos(2πβ1) + cos(2πβ2)). The heat map is shown in Figure 3(a) with 25 modes of different volumes. To mimic big data scenarios, we can only access stochastic gradient e U(β) = U(β) + 2N(0, I2 2) and stochastic energy e U(β) = U(β) + 2N(0, I).
Researcher Affiliation Collaboration 1 Purdue University, West Lafayette, IN 2 Morgan Stanley, New York, NY 3 University of Michigan, Ann Arbor, MI
Pseudocode Yes Algorithm 1: Non-reversible parallel tempering with SGD-based exploration kernels (DEO -SGD).
Open Source Code Yes Code The code is released to https://github.com/WayneDW/Non-reversible-Parallel-Tempering-for-Deep-Posterior-Approximation for reproduction.
Open Datasets Yes We choose Res Net20, Res Net32, and Res Net56 and train the models on CIFAR100.
Dataset Splits No The paper mentions training models on CIFAR100 but does not specify any explicit train/validation/test dataset splits, percentages, or methodology for splitting the data.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes We first run DEO -SGD P16 based on 16 chains and 20,000 iterations. We fix the lowest learning rate 0.003 and the highest learning 0.6 and propose to tune the target swap rate S for the acceleration-accuracy trade-off. ... For each model, we first pre-train 10 fixed models via 300 epochs and then run algorithms based on momentum SGD (m SGD) for 500 epochs with 10 parallel chains... We fix the lowest and highest learning rates as 0.005 and 0.02, respectively.