Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Maximum Entropy Inverse Reinforcement Learning of Diffusion Models with Energy-Based Models
Authors: Sangwoong Yoon, Himchan Hwang, Dohyun Kwon, Yung-Kyun Noh, Frank Park
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide experimental results demonstrating the effectiveness of Dx MI in training diffusion models and EBMs. On image generation tasks, Dx MI can train strong short-run diffusion models that generate samples in 4 or 10 neural network evaluations. Also, Dx MI can be used to train strong energy-based anomaly detectors. |
| Researcher Affiliation | Collaboration | Sangwoong Yoon1, Himchan Hwang2, Dohyun Kwon1,3 , Yung-Kyun Noh1,4 , Frank C. Park2,5 1Korea Institute for Advanced Study, 2Seoul National University, 3University of Seoul, 4Hanyang University, 5Saige Research |
| Pseudocode | Yes | Algorithm 1 Diffusion by Maximum Entropy IRL; Algorithm 2 Diffusion by Maximum Entropy IRL for Image Generation |
| Open Source Code | Yes | The code for Dx MI can be found in https://github.com/swyoon/Diffusion-by-Max Ent IRL.git. |
| Open Datasets | Yes | On image generation tasks, we show that Dx MI can be used to fine-tune a diffusion model with reduced generation steps, such as T = 4 or 10. We test Dx MI on unconditional CIFAR-10 [52] (32 Γ 32), conditional Image Net [53] downsampled to 64 Γ 64, and LSUN Bedroom [54] (256 Γ 256), using three diffusion model backbones, DDPM [3], DDGAN [46], and variance exploding version of EDM [50]. |
| Dataset Splits | Yes | On image generation tasks, we show that Dx MI can be used to fine-tune a diffusion model with reduced generation steps, such as T = 4 or 10. We test Dx MI on unconditional CIFAR-10 [52] (32 Γ 32), conditional Image Net [53] downsampled to 64 Γ 64, and LSUN Bedroom [54] (256 Γ 256)... When computing FID, the whole 50,000 training images of CIFAR-10 are used. To select the best model, we periodically generate 10,000 images for CIFAR-10 and 5,000 images for Image Net. The checkpoint with the best FID score is selected as the final model. For Image Net, we use the batch stat file provided by https://github.com/openai/guided-diffusion. For MVTec-AD, βThe training dataset contains normal object images from 15 categories without any labels. The test set consists of both normal and defective object images...β |
| Hardware Specification | Yes | In practice, our CIFAR-10 experiment completes in under 24 hours on two A100 GPUs, while the Image Net 64 experiment takes approximately 48 hours on four A100 GPUs. |
| Software Dependencies | No | The paper mentions optimizers (Adam, RAdam) and mixed precision training but does not provide specific software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x, Python version). |
| Experiment Setup | Yes | We set Ο1 = 0.1 and Ο2 = 0.01. The sigmoid time cost is used for all image generation experiments. For all runs, we use a batch size of 128. In the CIFAR-10 experiments, we use the Adam optimizer with a learning rate of 10β7 for the sampler weights, 10β5 for the value weights, and 10β5 for the Οt s. |