IF-Font: Ideographic Description Sequence-Following Font Generation
Authors: Xinping Chen, Xiao Ke, Wenzhong Guo
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our method greatly outperforms state-of-the-art methods in both one-shot and few-shot settings, particularly when the target styles differ significantly from the training font styles. |
| Researcher Affiliation | Academia | Xinping Chen, Xiao Ke , Wenzhong Guo Fujian Provincial Key Laboratory of Networking Computing and Intelligent Information Processing, College of Computer and Data Science, Fuzhou University, Fuzhou 350116, China Engineering Research Center of Big Data Intelligence, Ministry of Education, Fuzhou 350116, China {221027017, kex, guowenzhong}@fzu.edu.cn |
| Pseudocode | No | The paper includes mathematical equations and architectural diagrams, but no explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | The code is available at https://github.com/Stareven233/IF-Font. |
| Open Datasets | No | We gathered 464 fonts from the Internet, covering diverse categories like printed, handwritten, and artistic styles. Next, we selected 3,500 commonly encountered Chinese characters and rendered them into 128x128 resolution images using the collected fonts. We found a publicly accessible IDS decomposition table3. However, it exhibits several redundant entries and circular references, as well as an absence of some characters. Therefore, we performed simplifications and enhancements, reducing the number of IDCs to the 12 depicted in Table 1, which is sufficient for most frequently used Chinese characters. For convenience, we set the basic component s IDS as itself. 3https://babelstone.co.uk/CJK/IDS.TXT |
| Dataset Splits | No | The training set comprises 3,300 randomly selected Chinese characters and 424 fonts, referred to as Seen Fonts and Seen Characters (SFSC). There are two test sets: the first includes the same 3,300 characters but with different 40 fonts, called Unseen Fonts and Seen Characters (UFSC). The second test set consists of the remaining 200 characters and the same 40 fonts, known as Unseen Fonts and Unseen Characters (UFUC). (No explicit validation set mentioned, only training and test sets.) |
| Hardware Specification | Yes | IF-Font was trained in a server equipped with an Intel Xeon Silver 4110 CPU, 128 GB of RAM, and an NVIDIA Tesla V100 PCIe 16GB GPU. |
| Software Dependencies | No | The paper mentions VQGAN and other libraries (e.g., torchmetrics), but does not specify their version numbers or the specific deep learning framework used with its version number. |
| Experiment Setup | Yes | It has a codebook size of 256, and the encoder downsamples the image with a factor of 8. Consequently, the length of the codebook indices corresponding to a single glyph is l T = 256, whereas the IDS have a fixed length of l I = 35. Our decoder consists of 10 Transformer blocks, each integrating a self-attention layer, a cross-attention layer, and a multi-layer perceptron (MLP). We have configured the model with 8 attention heads and a feature dimension of 384. In IF-Font, parameters are optimized using the Adam W optimizer [32], which employs a learning rate schedule that includes warmup and cosine annealing. |