VQ-FONT: Few-Shot Font Generation with Structure-Aware Enhancement and Quantization
Authors: Mingshuai Yao, Yabo Zhang, Xianhui Lin, Xiaoming Li, Wangmeng Zuo
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on a collected font dataset show that our VQ-Font outperforms the competing methods both quantitatively and qualitatively, especially in generating challenging styles. |
| Researcher Affiliation | Academia | 1 Harbin Institute of Technology 2 Institute for Intelligent Computing 3 Peng Cheng Laboratory |
| Pseudocode | No | The paper provides architectural diagrams and descriptions but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Yaomingshuai/VQ-Font. |
| Open Datasets | No | We follow previous works (Park et al. 2021a,b; Tang et al. 2022) and collect 382 fonts with various types to build our dataset. |
| Dataset Splits | No | We split these 3499 characters into 3 groups, i.e., 2841 seen characters, 158 reference characters, and 500 unseen characters. For each character, we follow FS-Font (Tang et al. 2022) and select 3 reference characters from the reference set that can cover most of its structure components. We use Kai font as our default content font and train our model on 371 seen fonts, leaving 10 unseen fonts that do not appear in the training stage. In this way, our training set totally consists of 371 seen fonts, each of which has 2841 seen characters (SFSC). Our test set consists of two parts, i.e., 10 seen fonts with 500 unseen characters (SFUC) and 10 unseen fonts with 500 unseen characters (UFUC), encompassing a diverse range of font types, such as handwriting, printing, and artistic styles. The paper specifies training and testing sets, but does not explicitly mention a distinct validation set with its size or split methodology. |
| Hardware Specification | Yes | We adopt Adam optimizer (Kingma and Ba 2014) with a batch size of 32 and rely on one A6000 GPU. |
| Software Dependencies | No | The paper mentions using Adam optimizer and a pre-trained VGG16 model, but does not provide specific version numbers for software dependencies like programming languages or libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | In the pre-training phase of our font codebook, we encode the font images into 16 16 features. The size of our font codebook is set to 1,024. At this stage, VQGAN is trained for 2e5 iterations with a learning rate of 4e-5. The number of attention heads in the cross-attention module is set to 8. We select 3 reference characters for each Chinese character. In the subsequent token prior refinement stage, we keep the pre-trained font codebook and the later layers of the decoder fixed, while concentrating on training the remaining layers of the VQ-Font for 300k iterations. Here, the learning rate is set to 2e-4. We adopt Adam optimizer (Kingma and Ba 2014) with a batch size of 32 and rely on one A6000 GPU. |