Subject-driven Text-to-Image Generation via Apprenticeship Learning
Authors: Wenhu Chen, Hexiang Hu, Yandong Li, Nataniel Ruiz, Xuhui Jia, Ming-Wei Chang, William W. Cohen
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a comprehensive set of automatic and human evaluations to show the capability of our model on generating highly faithful and creative images on Dream Bench and Dream Bench-v2. |
| Researcher Affiliation | Industry | Google Deepmind Google Research {wenhuchen,hexiang,mingweichang,wcohen}@google.com |
| Pseudocode | Yes | Algorithm 1 Apprenticeship Learning from a Large Crowd of Specialized Expert Models |
| Open Source Code | No | To facilitate the reproducibility of our model performance, we release the Su TI model API as a Google Cloud Vertex AI model service, under the production name Instant tuning 3. Generally available at https://cloud.google.com/vertex-ai/docs/generative-ai/image/fine-tune-model. This provides access to the model via an API, but not to its source code in a repository. |
| Open Datasets | Yes | We construct the seed dataset using the Web LI [10, 20] dataset. |
| Dataset Splits | No | The paper constructs a training dataset 'G' from expert models but does not specify validation splits for this dataset used during the training of the apprentice model. |
| Hardware Specification | Yes | We tune each model on a single TPU core (32 GB)... The apprentice training is performed on 128 Cloud TPU v4 chips. |
| Software Dependencies | No | The paper mentions optimizers (Adafactor) and models (CLIP Vi T-L14, Imagen checkpoint) but does not provide specific version numbers for software libraries or environments like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | We tune each model on a single TPU core (32 GB) for 500 steps using Adafactor optimizer with a learning rate of 1e-5... We train the model for a total of 150K steps. We use an Adafactor optimizer with a learning rate of 1e-4. We use 3 demonstrations during training... We use a lower classifier-free guidance weight of 15 with DDPM [14] sampling strategy. |