Learning Energy-Based Model with Variational Auto-Encoder as Amortized Sampler

Authors: Jianwen Xie, Zilong Zheng, Ping Li10441-10451

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experiments to demonstrate the effectiveness of our strategy to train EBMs with (a) competitive synthesis for images, (b) high expressiveness of the learned latent variable model, and (c) strong performance in image completion. We use the Paddle Paddle 1 deep learning platform. 6.1 Image Generation We show that our framework is effective to represent a probability density of images. We demonstrate the learned model can generate realistic image patterns. We learn our model from MNIST (Le Cun et al. 1998), Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017) and CIFAR-10 (Krizhevsky 2009) images without class labels. Figure 6 shows some examples generated by the ancestral Langevin sampling. We also quantitatively evaluate the qualities of the generated images via FID score (Heusel et al. 2017) and Inception score (Salimans et al. 2016) in Table 1 and Table 2. The experiments validate the effectiveness of our model.
Researcher Affiliation Industry Jianwen Xie, Zilong Zheng, Ping Li Cognitive Computing Lab Baidu Research 10900 NE 8th St. Bellevue, WA 98004, USA {jianwen.kenny, zlzheng.cs, pingli98}@gmail.com
Pseudocode Yes Algorithm 1 Cooperative training of EBM and VAE via variational MCMC teaching
Open Source Code No The paper mentions 'https://www.paddlepaddle.org.cn' in a footnote related to the deep learning platform used, but does not provide a link or explicit statement about the source code for the methodology described in this paper.
Open Datasets Yes We learn our model from MNIST (Le Cun et al. 1998), Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017) and CIFAR-10 (Krizhevsky 2009) images without class labels. on the Paris Street View (Pathak et al. 2016) and the CMP Facade datasets (Tyleˇcek and ˇS ara 2013)
Dataset Splits No The paper uses standard datasets like MNIST, Fashion-MNIST, CIFAR-10, Paris Street View, and CMP Facade datasets. While it implicitly uses training data, it does not explicitly provide specific details about validation data splits (e.g., percentages, sample counts, or explicit mention of a validation set) that would be needed for reproduction.
Hardware Specification No The paper states 'We use the Paddle Paddle 1 deep learning platform.' but does not provide any specific details about the hardware used, such as GPU or CPU models, memory, or cloud instance types.
Software Dependencies No The paper mentions using 'Paddle Paddle' and 'Adam' optimizer but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes We design all networks in our model with simple convolution and Re LU layers, and only use 15 or 50 Langevin steps. The Langevin step size δ = 0.002. The number of latent dimension d = 200. Table 4 shows the influence of varying number of Langevin step and Langevin step size, respectively. Table 5 displays the inception scores as a func- tion of the number of latent dimensions of qα(x). We set l = 10, δ = 0.002, and γ = 2. Table 6 displays the inception scores of varying γ, with d = 200, l = 10, and δ = 0.002. The optimal choice of γ in our model is roughly 2.