Multi-Architecture Multi-Expert Diffusion Models
Authors: Yunsung Lee, JinYoung Kim, Hyojun Go, Myeongho Jeong, Shinhyeok Oh, Seungtaek Choi
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, MEME operates 3.3 times faster than baselines while improving image generation quality (FID scores) by 0.62 (FFHQ) and 0.37 (Celeb A). Though we validate the effectiveness of assigning more optimal architecture per time-step, where efficient models outperform the larger models, we argue that MEME opens a new design choice for diffusion models that can be easily applied in other scenarios, such as large multi-expert models. |
| Researcher Affiliation | Industry | 1Riiid AI Research 2Wrtn Technologies 3Twelvelabs 4Yanolja |
| Pseudocode | No | The paper provides architectural diagrams and mathematical equations but no structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a concrete link or explicit statement about the availability of the source code for the MEME methodology itself. It only references third-party repositories for pretrained models and FID calculation tools. |
| Open Datasets | Yes | We evaluated the unconditional generation of models on two datasets, FFHQ (Karras, Laine, and Aila 2019) and Celeb A-HQ (Karras et al. 2018). |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | Yes | All experiments were conducted on a single NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions software components like 'AdamW optimizer' and 'Clean-FID implementation' but does not provide specific version numbers for these or other critical software dependencies. |
| Experiment Setup | Yes | We set the number of experts N to 4 for all multi-expert settings, including MEME. [...] We primarily utilize the AdamW optimizer (Loshchilov and Hutter 2017). The base learning rate is set according to the oigianl LDM (Rombach et al. 2022). Notably, our smaller models employ a setting that doubles the batch size [...] We configured the probabilities: p1 = 0.8, p2 = 0.4, p3 = 0.2, and p4 = 0.1. [...] We trained a lightweight version of ADM [...] on the Celeb A-64 dataset with batch size 8 and 200K iterations. |