Subject-driven Text-to-Image Generation via Apprenticeship Learning

Authors: Wenhu Chen, Hexiang Hu, Yandong Li, Nataniel Ruiz, Xuhui Jia, Ming-Wei Chang, William W. Cohen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a comprehensive set of automatic and human evaluations to show the capability of our model on generating highly faithful and creative images on Dream Bench and Dream Bench-v2.
Researcher Affiliation Industry Google Deepmind Google Research {wenhuchen,hexiang,mingweichang,wcohen}@google.com
Pseudocode Yes Algorithm 1 Apprenticeship Learning from a Large Crowd of Specialized Expert Models
Open Source Code No To facilitate the reproducibility of our model performance, we release the Su TI model API as a Google Cloud Vertex AI model service, under the production name Instant tuning 3. Generally available at https://cloud.google.com/vertex-ai/docs/generative-ai/image/fine-tune-model. This provides access to the model via an API, but not to its source code in a repository.
Open Datasets Yes We construct the seed dataset using the Web LI [10, 20] dataset.
Dataset Splits No The paper constructs a training dataset 'G' from expert models but does not specify validation splits for this dataset used during the training of the apprentice model.
Hardware Specification Yes We tune each model on a single TPU core (32 GB)... The apprentice training is performed on 128 Cloud TPU v4 chips.
Software Dependencies No The paper mentions optimizers (Adafactor) and models (CLIP Vi T-L14, Imagen checkpoint) but does not provide specific version numbers for software libraries or environments like Python, PyTorch, or TensorFlow.
Experiment Setup Yes We tune each model on a single TPU core (32 GB) for 500 steps using Adafactor optimizer with a learning rate of 1e-5... We train the model for a total of 150K steps. We use an Adafactor optimizer with a learning rate of 1e-4. We use 3 demonstrations during training... We use a lower classifier-free guidance weight of 15 with DDPM [14] sampling strategy.