AesFA: An Aesthetic Feature-Aware Arbitrary Neural Style Transfer
Authors: Joonwoo Kwon, Sooyoung Kim, Yuewei Lin, Shinjae Yoo, Jiook Cha
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments and ablations show the approach not only outperforms recent NST methods in terms of stylization quality, but it also achieves faster inference. Experimental Results In this section, the proposed model s validity is assessed both qualitatively and quantitatively in comparison to state-of-the-art NST approaches, including Aes UST (Wang et al. 2022), Ada IN (Huang and Belongie 2017), Ada Conv (Chandran et al. 2021), Micro AST (Wang et al. 2023), EFDM (Zhang et al. 2022a), Ada Attn (Liu et al. 2021a), IECAST (Chen et al. 2021), and Sty Tr2 (Deng et al. 2022). |
| Researcher Affiliation | Collaboration | Joonwoo Kwon1*, Sooyoung Kim1*, Yuewei Lin2 , Shinjae Yoo2 , Jiook Cha1 1Seoul National University 2Brookhaven National Laboratory |
| Pseudocode | No | No pseudocode or algorithm block is present in the paper. The architecture is described using diagrams and text. |
| Open Source Code | Yes | Codes are available at https://github.com/Sooyyoungg/Aes FA. |
| Open Datasets | Yes | To train our model, we use the COCO dataset (Lin et al. 2014) as content images and the Wiki Art dataset (Phillips and Mackintosh 2011) as style images. |
| Dataset Splits | No | The paper mentions using COCO and Wiki Art datasets for training and 10 content and 20 style images for testing, but does not provide explicit train/validation/test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | All experiments were conducted using the Py Torch framework (Paszke et al. 2019) on a single NVIDIA A100(40G) GPU. |
| Software Dependencies | No | The paper mentions using the 'Py Torch framework (Paszke et al. 2019)' but does not specify a version number for PyTorch or any other software dependency. |
| Experiment Setup | Yes | During training, images are rescaled to 512 pixels while maintaining the original aspect ratio then randomly cropped to 256 256 pixels for augmentation. The model is trained using the Adam optimizer (Kingma and Ba 2014) with a learning rate of 0.0001 and a batch size of 8 for 160,000 iterations. The aesthetic feature has dimensions of (256, 3, 3) for both highand lowfrequency components. In this paper, we use λC =1 , λS =10, and λAes =5. |