reproducibilityindex.ai

Re-Imagen: Retrieval-Augmented Text-to-Image Generator

Authors: Wenhu Chen, Hexiang Hu, Chitwan Saharia, William W. Cohen

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train Re Imagen on a constructed dataset containing (image, text, retrieval) triples to teach the model to ground on both text prompt and retrieval. Furthermore, we develop a new sampling strategy to interleave the classiﬁer-free guidance for text and retrieval conditions to balance the text and retrieval alignment. Re-Imagen achieves signiﬁcant gain on FID score over COCO and Wiki Image. To further evaluate the capabilities of the model, we introduce Entity Draw Bench, a new benchmark that evaluates image generation for diverse entities, from frequent to rare, across multiple object categories including dogs, foods, landmarks, birds, and characters. Human evaluation on Entity Draw Bench shows that Re-Imagen can signiﬁcantly improve the ﬁdelity of generated images, especially on less frequent entities.
Researcher Affiliation	Industry	Wenhu Chen, Hexiang Hu, Chitwan Saharia, William W. Cohen Google Research {wenhuchen,hexiang,sahariac,wcohen}@google.com
Pseudocode	No	The paper describes the model architecture and processes using natural language and diagrams, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	Considering such potential threats to the public, we will be cautious about code and API release. In future work, we will explore a framework for responsible use that balances the value of external auditing of research with the risks of unrestricted open access, allowing this work to be used in a safe and beneﬁcial way.
Open Datasets	Yes	Re-Imagen achieves signiﬁcant gain on FID score over COCO and Wiki Image. To further evaluate the capabilities of the model, we introduce Entity Draw Bench, a new benchmark that evaluates image generation for diverse entities, from frequent to rare, across multiple object categories including dogs, foods, landmarks, birds, and characters. Human evaluation on Entity Draw Bench shows that Re-Imagen can signiﬁcantly improve the ﬁdelity of generated images, especially on less frequent entities.
Dataset Splits	Yes	We randomly sample 30K prompts from the validation set as input to the model. The generated images are compared with the reference images from the full validation set (42K). We randomly sample 22K as our validation set to perform zero-shot evaluation, we further sample 20K prompts from the dataset as the input.
Hardware Specification	Yes	The ﬁne-tuning was run for 200K steps on 64 TPU-v4 chips and completed within two days. The inference speed is 30-40 secs for 4 images on 4 TPU-v4 chips.
Software Dependencies	No	The paper mentions software components like T5 embedding, BM25, CLIP, ScaNN, Adafactor, and Adam, but it does not specify version numbers for these dependencies.
Experiment Setup	Yes	The guidance weight w for the 64 model is swept over [1.0, 1.25, 1.5, 1.75, 2.0], while the 256 256 superresolution models guidance weight w is swept over [1.0, 5.0, 8.0, 10.0]. We set the number of neighbors k=2 and set γ=BM25 during training. The ﬁne-tuning was run for 200K steps... We use Adafactor for the 64 model and Adam for the 256 superresolution model with a learning rate of 1e-4. for the 64 diffusion model, which runs for 256 diffusion steps under a strong guidance weight of w=30 for both text and neighbor conditions. For the 256 and 1024 resolution models, we use a constant guidance weight of 5.0 and 3.0, respectively, with 128 and 32 diffusion steps.