AK4Prompts: Aesthetics-driven Automatically Keywords-Ranking for Prompts in Text-To-Image Models
Authors: Haiyang Zhang, Mengchao Wang, Shuai He, Anlong Ming
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results show the superiority of AK4Prompts to improve the quality of generated images significantly over strong baselines. |
| Researcher Affiliation | Academia | School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications {zhhy, wangmengchao, hs19951021, mal}@bupt.edu.cn |
| Pseudocode | No | The paper includes illustrations and equations in Figure 2 but does not provide structured pseudocode or an algorithm block. |
| Open Source Code | Yes | Our code is available at https://github.com/m Robotit/AK4Prompts. |
| Open Datasets | Yes | We train our model using Beautiful Prompt s training set and evaluate its performance on the test set. The dataset includes 143k simple prompts and 2k test prompts. We train our model using Beautiful Prompt s training set and evaluate its performance on the test set. [Cao et al., 2023] |
| Dataset Splits | No | The paper specifies a training set of 143k and a test set of 2k prompts but does not explicitly mention a distinct validation set or its split details. |
| Hardware Specification | Yes | All experiments were implemented in Py Torch and run on a single server with NVIDIA RTX3090TI GPUs. |
| Software Dependencies | No | The paper mentions 'implemented in Py Torch' but does not specify a version number for PyTorch or any other software dependencies with their versions. |
| Experiment Setup | Yes | We generated 512x512 resolution images through a four-step inference with a CFG scale ω set to 1.0, leveraging FLOAT16 formats to save GPU memory and speed up training. For the semantic fusion module, we set L = 3. Regarding Ltotal, we set h = 2.25, c = 2.25, and a = 1. The learning rate was set at 1e-4, weight decay at 1e-2, batch size at 32, and training step at 88,000 steps. |