Any-scale Balanced Samplers for Discrete Space

Authors: Haoran Sun, Bo Dai, Charles Sutton, Dale Schuurmans, Hanjun Dai

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On various synthetic and real distributions, the proposed sampler substantially outperforms existing approaches. We conducted an experimental evaluation on three types of target distributions: 1) quadratic synthetic distributions, 2) non-quadratic synthetic distributions, and 3) real distributions.
Researcher Affiliation Collaboration Haoran Sun hsun349@gatech.edu Bo Dai bodai@google.com Charles Sutton charlessutton@google.com Dale Schuurmans schuurmans@google.com Hanjun Dai hadai@google.com Work done during an internship at Google. Georgia Tech Google Research, Brain Team University of Alberta
Pseudocode Yes Algorithm 1: AB sampling algorithm; Algorithm 2: AB M-H step; Algorithm 3: Adapting Algorithm; Algorithm 4: Adapting Algorithm Block
Open Source Code No No explicit statement or link to open-source code for the methodology is provided.
Open Datasets Yes For real distributions, we compare against baseline samplers on challenging inference problems in deep energy based models trained on MNIST, Omniglot, and Caltech datasets.
Dataset Splits No The paper mentions 'T=100,000 steps, with T1=20,000 burn-in steps to make sure the chain mixes.' which refers to MCMC chain length and burn-in, not explicit dataset splits (train/validation/test) with percentages or counts. For EBMs, it mentions a training framework and number of steps to obtain samples, but not explicit dataset splits.
Hardware Specification Yes All experiments are running on a virtual machine with CPU: Intel Haswell, GPU: 4 Nvidia V100, System: Debian 10.
Software Dependencies Yes In this work, we use academia version of Mosek (Ap S, 2019).
Experiment Setup Yes Input: Initial σ = 0.1, α = 0.5, W = 0, D = 0; initial x0... For each setting and sampler, we run 100 chains for T=100,000 steps, with T1=20,000 burn-in steps to make sure the chain mixes. Algorithm 3: Adapting Algorithm Input: initial σ = 0.1, α = 0.5, update rate γ = 0.2, decay rate β = 0.9, initial state x0, buffer size N = 100.