reproducibilityindex.ai

Compositional Image Decomposition with Diffusion Models

Authors: Jocelin Su, Nan Liu, Yanbo Wang, Joshua B. Tenenbaum, Yilun Du

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments In this section, we evaluate the ability of our approach to decompose images. First, we assess decomposition of images into global factors of variation in Section 4.2. We next evaluate decomposition of images into local factors of variation in Section 4.3. We further investigate the ability of decomposed components to recombine across separate trained models in Section 4.4. Finally, we illustrate how our approach can be adapted to pretrained models in Section 4.5. For quantitative evaluation of image quality, we employ Fr echet Inception Distance (FID) (Heusel et al., 2017), Kernel Inception Distance (KID) (Bi nkowski et al., 2018), and LPIPS (Zhang et al., 2018) on images reconstructed from Celeb A-HQ (Karras et al., 2017), Falcor3D (Nie et al., 2020), Virtual KITTI 2 (Cabon et al., 2020), and CLEVR (Johnson et al., 2017).
Researcher Affiliation	Academia	Jocelin Su 1 * Nan Liu 2 * Yanbo Wang 3 * Joshua B. Tenenbaum 1 Yilun Du 1 1MIT 2UIUC 3TU Delft.
Pseudocode	Yes	Algorithm 1 Training Algorithm and Algorithm 2 Image Generation Algorithm.
Open Source Code	Yes	Code and visualizations are at https://energy-based-model.github.io/decomp-diffusion.
Open Datasets	Yes	For quantitative evaluation of image quality, we employ Fr echet Inception Distance (FID) (Heusel et al., 2017), Kernel Inception Distance (KID) (Bi nkowski et al., 2018), and LPIPS (Zhang et al., 2018) on images reconstructed from Celeb A-HQ (Karras et al., 2017), Falcor3D (Nie et al., 2020), Virtual KITTI 2 (Cabon et al., 2020), and CLEVR (Johnson et al., 2017).
Dataset Splits	No	The paper mentions total dataset sizes (e.g., 'CLEVR 10K', 'Celeb A-HQ 30K') but does not specify explicit training, validation, or test splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	Yes	Each model is trained for 24 hours on an NVIDIA V100 32GB machine or an NVIDIA Ge Force RTX 2080 24GB machine.
Software Dependencies	No	The paper mentions using a 'standard U-Net architecture' and referencing codebases for baselines, but it does not specify version numbers for general software dependencies like Python, PyTorch, or CUDA for their own implementation.
Experiment Setup	Yes	We used standard denoising training to train our denoising networks, with 1000 diffusion steps and squared cosine beta schedule. In our implementation, the denoising network ϵθ is trained to directly predict the original image x0, since we show this leads to better performance due to the similarity between our training objective and autoencoder training. [...] We use a batch size of 32 when training.