Learning Energy-Based Generative Models via Coarse-to-Fine Expanding and Sampling
Authors: Yang Zhao, Jianwen Xie, Ping Li
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to validate our approach, including image generation, denoising, inpainting, out-of-distribution detection and unsupervised image translation. Strong results show that our method outperforms or is competitive with the prior art. |
| Researcher Affiliation | Industry | Yang Zhao, Jianwen Xie, Ping Li Cognitive Computing Lab Baidu Research 10900 NE 8th St. Bellevue, WA 98004, USA {yangzhao.eric, jianwen.kenny, pingli98}@gmail.com |
| Pseudocode | Yes | Algorithm 1: CF-EBM Training |
| Open Source Code | No | The paper does not provide a direct link to the open-source code for the described methodology. |
| Open Datasets | Yes | Datasets: (i) CIFAR-10 (Krizhevsky, 2009) is a dataset containing 60k images at 32 32 resolution in 10 classes; (ii) Celeb A (Liu et al., 2015) is a celebrity facial dataset containing over 200k images. For fair comparison with former EBM works, the 64 64 resolution is used for quantitative evaluation; (iii) Celeb A-HQ (Karras et al., 2018) contains 30k high resolution (512 512) facial images. ... We conduct an experiment on the continuous MNIST for likelihood evaluation by following the setting in Du & Mordatch (2019). Its open source implementation is at https://github.com/ openai/ebm_code_release. ... We use four unpaired image translation datasets for evaluation, including cat2dog, Yosemite summer2winter, vangogh2photo and apple2orange. All images are resized to 256 256 pixels. More details are provided in Appendix A.9.1. ... (i) selfie2anime This dataset is first introduced in Kim et al. (2020), and contains a selfie domain and an anime domain, each of which has 3400 training images and 100 testing images. |
| Dataset Splits | No | The paper mentions training, testing, and evaluation on datasets, but it does not specify the train/validation/test splits, beyond mentioning the size of the test sets for some datasets. |
| Hardware Specification | Yes | Most of them are implemented with the same network architecture and run in the TITAN X (12GB) platform. ... On a single TITAN X GPU (12GB), the total training time (200k iterations) of Celeb A-HQ 128 128 and 256 256 are about 120 hours and 235 hours, respectively. On a single TITAN V100 GPU, these costs can be reduced to 55 hours and 100 hours (we tested it once). |
| Software Dependencies | No | The paper mentions using Adam (Kingma & Ba, 2014) optimizer but does not provide version numbers for any software, libraries, or programming languages used. |
| Experiment Setup | Yes | We train the network using Adam (Kingma & Ba, 2014) optimizer with β1 = 0.5 and β2 = 0.999. We set the learning rate schedule as α = {8 8 : 0.001, 16 16 : 0.001, 32 32 : 0.001, 64 64 : 0.0012, 128 128 : 0.0015} and the data feeding schedule as N = {8 8 : 50k, 16 16 : 75k, 32 32 : 100k, 64 64 : 125k, 128 128 : 150k}. As to the sampling hyperparameters, we set the schedule of the number of Langevin steps T = {8 8 : 15, 16 16 : 30, 32 32 : 50, 64 64 : 50, 128 128 : 60}, the Langevin step size as 1.0, and the variance of the Langevin noise term as ηt = 2e 2 2e 2/(T t + 1) in most experiments. |