reproducibilityindex.ai

On Sampling with Approximate Transport Maps

Authors: Louis Grenioux, Alain Oliviero Durmus, Eric Moulines, Marylou Gabrié

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our study concludes that multimodal targets can be reliably handled with flow-based proposals up to moderately high dimensions. In contrast, methods relying on reparametrization struggle with multimodality but are more robust otherwise in high-dimensional settings and under poor training. To further illustrate the influence of targetproposal adequacy, we also derive a new quantitative bound for the mixing time of the Independent Metropolis-Hastings sampler.In all of our experiments, we selected the samplers hyperparameters by optimizing case-specific performance metrics. The length of chains was chosen to be twice the number of steps required to satisfy the ˆR diagnosis (Gelman & Rubin, 1992) at the 1.1-threshold for the fastest converging algorithm. We used MALA as local sampler as it is suited for the log-concave distributions considered, faster and easier to tune than HMC. Evaluation metrics are fully descibed in App. D.1.
Researcher Affiliation	Academia	1 Ecole Polytechnique. Correspondence to: Louis Grenioux <louis.grenioux@polytechnique.edu>, Alain Oliviero Durmus <alain.durmus@polytechnique.edu>, Eric Moulines <eric.moulines@polytechnique.edu>, Marylou Gabri e <marylou.gabrie@polytechnique.edu>.
Pseudocode	No	The paper does not contain any explicitly labeled “Pseudocode” or “Algorithm” blocks.
Open Source Code	Yes	The code to reproduce the experiments is available at https://github.com/h2o64/flow mcmc.
Open Datasets	Yes	Our second experiment is a sparse Bayesian hierarchical logistic regression on the German credit dataset (Dua & Graff, 2017), which has been used as a benchmark in recent papers (Hoffman et al., 2019; Grumitt et al., 2022; Cabezas & Nemeth, 2022). The ground truth was obtained through parallel tempering MD simulations from the same paper. We trained a SN-GAN (Miyato et al., 2018) on the CIFAR10 dataset for 100.000 epochs (implementation from https://github.com/kwotsin/mimicry).
Dataset Splits	No	The paper mentions training on datasets like German credit and CIFAR10, but it does not explicitly provide percentages, sample counts, or specific methodology for training, validation, and test splits for reproducibility. While standard datasets often have predefined splits, the paper does not state how these were applied or referenced for its experiments.
Hardware Specification	Yes	We used a single type of GPU, the Nvidia A100. Each experiment of Section 3 took about 2 hours to run when distributed on 4 GPUs.
Software Dependencies	No	The paper mentions using Adam (Kingma & Ba, 2014) optimizer and Pyro’s (Bingham et al., 2019) implementation of HMC. However, it does not specify version numbers for these or other software libraries (e.g., Python, PyTorch/TensorFlow, CUDA) required for exact reproducibility.
Experiment Setup	Yes	In all of our experiments, we selected the samplers hyperparameters by optimizing case-specific performance metrics. The length of chains was chosen to be twice the number of steps required to satisfy the ˆR diagnosis (Gelman & Rubin, 1992) at the 1.1-threshold for the fastest converging algorithm. We used MALA as local sampler as it is suited for the log-concave distributions considered, faster and easier to tune than HMC. Evaluation metrics are fully descibed in App. D.1. (Tables 3, 4, 5, 6, 7, 8, 9 provide specific hyperparameters like number of iterations, learning rate, hidden layer sizes, MCMC steps, and particle counts for various experiments.)