Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models

Authors: Sangwoong Yoon, Himchan Hwang, Dohyun Kwon, Yung-Kyun Noh, Frank Park

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide experimental results demonstrating the effectiveness of Dx MI in training diffusion models and EBMs. On image generation tasks, Dx MI can train strong short-run diffusion models that generate samples in 4 or 10 neural network evaluations. Also, Dx MI can be used to train strong energy-based anomaly detectors.
Researcher Affiliation Collaboration Sangwoong Yoon1, Himchan Hwang2, Dohyun Kwon1,3 , Yung-Kyun Noh1,4 , Frank C. Park2,5 1Korea Institute for Advanced Study, 2Seoul National University, 3University of Seoul, 4Hanyang University, 5Saige Research
Pseudocode Yes Algorithm 1 Diffusion by Maximum Entropy IRL; Algorithm 2 Diffusion by Maximum Entropy IRL for Image Generation
Open Source Code Yes The code for Dx MI can be found in https://github.com/swyoon/Diffusion-by-Max Ent IRL.git.
Open Datasets Yes On image generation tasks, we show that Dx MI can be used to fine-tune a diffusion model with reduced generation steps, such as T = 4 or 10. We test Dx MI on unconditional CIFAR-10 [52] (32 × 32), conditional Image Net [53] downsampled to 64 × 64, and LSUN Bedroom [54] (256 × 256), using three diffusion model backbones, DDPM [3], DDGAN [46], and variance exploding version of EDM [50].
Dataset Splits Yes On image generation tasks, we show that Dx MI can be used to fine-tune a diffusion model with reduced generation steps, such as T = 4 or 10. We test Dx MI on unconditional CIFAR-10 [52] (32 × 32), conditional Image Net [53] downsampled to 64 × 64, and LSUN Bedroom [54] (256 × 256)... When computing FID, the whole 50,000 training images of CIFAR-10 are used. To select the best model, we periodically generate 10,000 images for CIFAR-10 and 5,000 images for Image Net. The checkpoint with the best FID score is selected as the final model. For Image Net, we use the batch stat file provided by https://github.com/openai/guided-diffusion. For MVTec-AD, “The training dataset contains normal object images from 15 categories without any labels. The test set consists of both normal and defective object images...”
Hardware Specification Yes In practice, our CIFAR-10 experiment completes in under 24 hours on two A100 GPUs, while the Image Net 64 experiment takes approximately 48 hours on four A100 GPUs.
Software Dependencies No The paper mentions optimizers (Adam, RAdam) and mixed precision training but does not provide specific software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python version).
Experiment Setup Yes We set τ1 = 0.1 and τ2 = 0.01. The sigmoid time cost is used for all image generation experiments. For all runs, we use a batch size of 128. In the CIFAR-10 experiments, we use the Adam optimizer with a learning rate of 10−7 for the sampler weights, 10−5 for the value weights, and 10−5 for the σt s.