Training Energy-Based Normalizing Flow with Score-Matching Objectives

Authors: Chen-Hao Chao, Wei-Fang Sun, Yen-Chang Hsu, Zsolt Kira, Chun-Yi Lee

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results demonstrate that our approach achieves a significant speedup compared to maximum likelihood estimation while outperforming prior methods with a noticeable margin in terms of negative log-likelihood (NLL). In the following experiments, we first compare the training efficiency of the baselines trained with LML and EBFlow trained with LSML, LSSM, LFDSSM, and LDSM to validate the effectiveness of the proposed method in Sections 5.1 and 5.2.
Researcher Affiliation Collaboration 1 Elsa Lab, National Tsing Hua University, Hsinchu City, Taiwan 2 NVIDIA AI Technology Center, NVIDIA Corporation, Santa Clara, CA, USA 3 Samsung Research America, Mountain View, CA, USA 4 Georgia Institute of Technology, Atlanta, GA, USA
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code Yes The code implementation for the experiments is provided in the following repository: https://github.com/chen-hao-chao/ebflow.
Open Datasets Yes The experiments presented in Section 5.2 are performed on the MNIST [19] and CIFAR-10 [37] datasets.
Dataset Splits Yes A comparison of the training efficiency of the FC-based and CNN-based models evaluated on the validation set of MNIST and CIFAR-10.
Hardware Specification Yes The runtime is measured on NVIDIA Tesla V100 GPUs.
Software Dependencies No The paper mentions "Py Torch [39]" and "Tensorboard [49]" but does not specify their version numbers or any other software dependencies with specific version numbers. This makes it difficult to replicate the exact software environment.
Experiment Setup Yes The optimizers include Adam [46], Adam W [47], and RMSProp. The learning rate and gradient clipping values are selected from (5e-3, 1e-3, 5e-4, 1e-4) and (None, 2.5, 10.0), respectively. Table A1 summarizes the selected hyperparameters. The FC-based and CNN-based models are trained with RMSProp using a learning rate initialized at 1e-4 and a batch size of 100. The Glow model is trained with an Adam optimizer using a learning rate initialized at 1e-4 and a batch size of 100. The gradient clipping value is set to 500 during the training for the Glow model.