Muse: Text-To-Image Generation via Masked Generative Transformers
Authors: Huiwen Chang, Han Zhang, Jarred Barber, Aaron Maschinot, Jose Lezama, Lu Jiang, Ming-Hsuan Yang, Kevin Patrick Murphy, William T. Freeman, Michael Rubinstein, Yuanzhen Li, Dilip Krishnan
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance... Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. |
| Researcher Affiliation | Industry | Google Research |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | No | More results and videos demonstrating editing are available at http://muse-icml.github.io and Due to these important considerations, we opt to not release code or a public demo at this point in time. |
| Open Datasets | Yes | We train on the Imagen dataset, consisting of 860M textimage pairs (Saharia et al., 2022). and In Table 1 and Table 2, we show our performance against other methods on the CC3M (Sharma et al., 2018) and COCO (Lin et al., 2014) datasets |
| Dataset Splits | No | The paper mentions using established datasets like Imagen, CC3M, and COCO for training and evaluation, but it does not provide specific details on how these datasets were split into training, validation, and test sets (e.g., exact percentages or sample counts for each split, or references to predefined split files). |
| Hardware Specification | Yes | Each image was generated in 1.4s on a TPUv4 chip. |
| Software Dependencies | No | The paper mentions using optimizers like Adafactor (Shazeer & Stern, 2018) and Adam (Kingma & Ba, 2015), and specific learning rate schedules (cosine decay), but does not provide details on software dependencies such as programming languages, libraries, or frameworks with their specific version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | Training is performed for 1M steps, with a batch size of 512 on 512-core TPU-v4 chips (Jouppi et al., 2020). |