Improved sampling via learned diffusions

Authors: Lorenz Richter, Julius Berner

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 NUMERICAL EXPERIMENTS. We evaluate the different methods on the following three numerical benchmark examples.
Researcher Affiliation Collaboration Lorenz Richter Zuse Institute Berlin dida Datenschmiede Gmb H richter@zib.de Julius Berner Caltech jberner@caltech.edu
Pseudocode Yes Algorithm 1 Training of a generalized time-reversed diffusion sampler
Open Source Code Yes The repository can be found at https://github.com/juliusberner/sde_sampler.
Open Datasets Yes Gaussian mixture model (GMM): We consider ρ(x) = 1 m Pm i=1 N(x; µi, Σi) and choose m = 9, Σi = 0.3 I, (µi)9 i=1 = { 5, 0, 5} { 5, 0, 5} R2 to obtain well-separated modes, see Figure 2.
Dataset Splits No The paper describes generating samples from target distributions and evaluating their quality against ground truth. It does not mention traditional training, validation, or test splits of a fixed dataset, as its methodology involves learning to transport a prior distribution to a target distribution rather than splitting a pre-existing dataset. Therefore, specific dataset split information is not applicable or provided.
Hardware Specification No The paper vaguely mentions: “Every experiment is executed on a single GPU”. This is insufficient as it does not specify the model, memory, or any other relevant details of the GPU or other hardware components used.
Software Dependencies No The paper mentions using a “Py Torch implementation” and the “Adam optimizer”, but it does not specify the version numbers for PyTorch or any other software libraries, which is crucial for reproducibility.
Experiment Setup Yes In particular, we use the Fourier MLPs of Zhang & Chen (2022), a batch size of 2048, and the Adam optimizer. To facilitate the comparisons, we use a fixed number of 200 steps for the Euler-Maruyama scheme. A difference to Berner et al. (2024) is that we observed better performance (for all considered methods and losses) by using an exponentially decaying learning rate starting at 0.005 and decaying every 100 steps to a final learning rate of 10 4. We use 60000 gradient steps for the experiments with d <= 10 and 120000 gradient steps otherwise to approximately achieve convergence.