Diffusion PID: Interpreting Diffusion via Partial Information Decomposition
Authors: Shaurya Dewan, Rushikesh Zawar, Prakanshul Saxena, Yingshan CHANG, Andrew Luo, Yonatan Bisk
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our work presents Diffusion Partial Information Decomposition (Diffusion PID), a novel technique that applies information-theoretic principles to decompose the input text prompt into its elementary components, enabling a detailed examination of how individual tokens and their interactions shape the generated image. ... Our results show that PID is a potent tool for evaluating and diagnosing text-to-image diffusion models. |
| Researcher Affiliation | Academia | Shaurya Dewan* Rushikesh Zawar* Prakanshul Saxena* Yingshan Chang Andrew Luo Yonatan Bisk Carnegie Mellon University (srdewan, rzawar, prakanss, yingshac, afluo, ybisk)@andrew.cmu.edu |
| Pseudocode | No | The paper uses mathematical equations and describes procedures in prose, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Link to project page: https://rbz-99.github.io/Diffusion-PID/. ... We have shared our code and data in the supplementary. We will be releasing the code publicly before the main conference. |
| Open Datasets | Yes | We primarily rely on existing hierarchical datasets, Wordnet [77] and COCO [76]... Stable Diffusion 2.1 [7]. |
| Dataset Splits | No | The paper uses a pre-trained model and generates images for analysis. It does not describe any specific training, validation, or test splits for its own experimental methodology beyond generating data. |
| Hardware Specification | Yes | A single A6000 GPU was used to generate the PID maps for each data sample. |
| Software Dependencies | No | The paper mentions using 'BERT [61]' and 'Stable Diffusion 2.1 model from Hugging Face' but does not provide specific version numbers for these or other software dependencies like Python or deep learning frameworks. |
| Experiment Setup | Yes | For our experiments, we primarily focus on the pre-trained Stable Diffusion 2.1 model from Hugging Face. ...Thus, all our PID computations occur in this latent space... During visualization, the heatmaps are bilinearly interpolated from this latent space to the original image resolution. Finally, we make use of 50 samples for evaluating the integral over SNRs using importance sampling. A single A6000 GPU was used to generate the PID maps for each data sample. |