Multi-Architecture Multi-Expert Diffusion Models

Authors: Yunsung Lee, JinYoung Kim, Hyojun Go, Myeongho Jeong, Shinhyeok Oh, Seungtaek Choi

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, MEME operates 3.3 times faster than baselines while improving image generation quality (FID scores) by 0.62 (FFHQ) and 0.37 (Celeb A). Though we validate the effectiveness of assigning more optimal architecture per time-step, where efficient models outperform the larger models, we argue that MEME opens a new design choice for diffusion models that can be easily applied in other scenarios, such as large multi-expert models.
Researcher Affiliation Industry 1Riiid AI Research 2Wrtn Technologies 3Twelvelabs 4Yanolja
Pseudocode No The paper provides architectural diagrams and mathematical equations but no structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a concrete link or explicit statement about the availability of the source code for the MEME methodology itself. It only references third-party repositories for pretrained models and FID calculation tools.
Open Datasets Yes We evaluated the unconditional generation of models on two datasets, FFHQ (Karras, Laine, and Aila 2019) and Celeb A-HQ (Karras et al. 2018).
Dataset Splits No The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification Yes All experiments were conducted on a single NVIDIA A100 GPU.
Software Dependencies No The paper mentions software components like 'AdamW optimizer' and 'Clean-FID implementation' but does not provide specific version numbers for these or other critical software dependencies.
Experiment Setup Yes We set the number of experts N to 4 for all multi-expert settings, including MEME. [...] We primarily utilize the AdamW optimizer (Loshchilov and Hutter 2017). The base learning rate is set according to the oigianl LDM (Rombach et al. 2022). Notably, our smaller models employ a setting that doubles the batch size [...] We configured the probabilities: p1 = 0.8, p2 = 0.4, p3 = 0.2, and p4 = 0.1. [...] We trained a lightweight version of ADM [...] on the Celeb A-64 dataset with batch size 8 and 200K iterations.