Dynamic Instance Normalization for Arbitrary Style Transfer

Authors: Yongcheng Jing, Xiao Liu, Yukang Ding, Xinchao Wang, Errui Ding, Mingli Song, Shilei Wen4369-4376

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that the proposed approach yields very encouraging results on challenging style patterns and, to our best knowledge, for the first time enables an arbitrary style transfer using Mobile Net-based lightweight architecture, leading to a reduction factor of more than twenty in computational cost as compared to existing approaches.
Researcher Affiliation Collaboration Yongcheng Jing,1 Xiao Liu,2 Yukang Ding,2 Xinchao Wang,3 Errui Ding,2 Mingli Song,1 Shilei Wen2 1Zhejiang University, 2Department of Computer Vision Technology (VIS), Baidu Inc., 3Stevens Institute of Technology
Pseudocode No The paper includes network diagrams and mathematical formulas but no explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Our network is trained on 82, 783 content images from Microsoft COCO dataset (Lin et al. 2014), and 79, 433 style images from Wiki Art (Nichol 2016).
Dataset Splits No The paper mentions training and testing datasets but does not specify details for a validation split (percentages, sample counts, or explicit validation set name).
Hardware Specification Yes The training takes roughly one day on an NVIDIA Tesla V100 GPU.
Software Dependencies No The paper mentions using specific models and optimizers like Adam and VGG-19, but does not provide version numbers for general software dependencies such as programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries.
Experiment Setup Yes The content loss is computed at layer {relu4 1}, while the style loss is computed at layer {relu1 1, relu2 1, relu3 1, relu4 1} of the VGG network. During training, we adopt the Adam optimizer (Kingma and Ba 2015). The learning rates for both the image encoder and decoder are set to 0.0001. The weight and bias networks in DIN layers are set to have a 10 learning rate for faster convergence.