Any-scale Balanced Samplers for Discrete Space
Authors: Haoran Sun, Bo Dai, Charles Sutton, Dale Schuurmans, Hanjun Dai
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On various synthetic and real distributions, the proposed sampler substantially outperforms existing approaches. We conducted an experimental evaluation on three types of target distributions: 1) quadratic synthetic distributions, 2) non-quadratic synthetic distributions, and 3) real distributions. |
| Researcher Affiliation | Collaboration | Haoran Sun hsun349@gatech.edu Bo Dai bodai@google.com Charles Sutton charlessutton@google.com Dale Schuurmans schuurmans@google.com Hanjun Dai hadai@google.com Work done during an internship at Google. Georgia Tech Google Research, Brain Team University of Alberta |
| Pseudocode | Yes | Algorithm 1: AB sampling algorithm; Algorithm 2: AB M-H step; Algorithm 3: Adapting Algorithm; Algorithm 4: Adapting Algorithm Block |
| Open Source Code | No | No explicit statement or link to open-source code for the methodology is provided. |
| Open Datasets | Yes | For real distributions, we compare against baseline samplers on challenging inference problems in deep energy based models trained on MNIST, Omniglot, and Caltech datasets. |
| Dataset Splits | No | The paper mentions 'T=100,000 steps, with T1=20,000 burn-in steps to make sure the chain mixes.' which refers to MCMC chain length and burn-in, not explicit dataset splits (train/validation/test) with percentages or counts. For EBMs, it mentions a training framework and number of steps to obtain samples, but not explicit dataset splits. |
| Hardware Specification | Yes | All experiments are running on a virtual machine with CPU: Intel Haswell, GPU: 4 Nvidia V100, System: Debian 10. |
| Software Dependencies | Yes | In this work, we use academia version of Mosek (Ap S, 2019). |
| Experiment Setup | Yes | Input: Initial σ = 0.1, α = 0.5, W = 0, D = 0; initial x0... For each setting and sampler, we run 100 chains for T=100,000 steps, with T1=20,000 burn-in steps to make sure the chain mixes. Algorithm 3: Adapting Algorithm Input: initial σ = 0.1, α = 0.5, update rate γ = 0.2, decay rate β = 0.9, initial state x0, buffer size N = 100. |