Demystifying Neural Style Transfer
Authors: Yanghao Li, Naiyan Wang, Jiaying Liu, Xiaodi Hou
IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we propose a novel interpretation of neural style transfer by treating it as a domain adaptation problem. Specifically, we theoretically show that matching the Gram matrices of feature maps is equivalent to minimize the Maximum Mean Discrepancy (MMD) with the second order polynomial kernel. Thus, we argue that the essence of neural style transfer is to match the feature distributions between the style images and the generated images. To further support our standpoint, we experiment with several other distribution alignment methods, and achieve appealing results. |
| Researcher Affiliation | Academia | Yanghao Li Naiyan Wang Jiaying Liu Xiaodi Hou Institute of Computer Science and Technology, Peking University Tu Simple lyttonhao@pku.edu.cn winsty@gmail.com liujiaying@pku.edu.cn xiaodi.hou@gmail.com |
| Pseudocode | No | The paper describes methods using mathematical equations and prose, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: "Our implementation is based on the MXNet [Chen et al., 2016] implementation1 which reproduces the results of original neural style transfer [Gatys et al., 2016]." Footnote 1 links to a general MXNet example. While they based their work on it, they do not explicitly state that the source code for *their novel contributions* (e.g., the different MMD kernels or BN statistics matching) is openly available or provided. |
| Open Datasets | No | The paper states: "The images in the experiments are collected from the public implementations of neural style transfer123." While these implementations are public, the paper does not specify a formal dataset name, provide a direct link to the dataset itself, or cite a dataset with proper author/year attribution. The nature of neural style transfer experiments on individual images also means traditional large-scale dataset training/validation/test splits are not typically applicable here. |
| Dataset Splits | No | The paper performs image style transfer, which involves optimizing single images rather than training and validating models on distinct dataset splits. Therefore, it does not provide specific dataset split information (e.g., percentages or sample counts for training, validation, and test sets). |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run its experiments. It mentions using the VGG-19 network, but this refers to the model architecture, not the computational hardware. |
| Software Dependencies | No | The paper mentions: "Our implementation is based on the MXNet [Chen et al., 2016] implementation1". While MXNet is a software library, a specific version number for MXNet is not provided in the text, which is required for a reproducible software dependency description. |
| Experiment Setup | Yes | In the implementation, we use the VGG-19 network [Simonyan and Zisserman, 2015] following the choice in [Gatys et al., 2016]. We also adopt the relu4 2 layer for the content loss, and relu1 1, relu2 1, relu3 1, relu4 1, relu5 1 for the style loss. The default weight factor wl is set as 1.0 if it is not specified. The target image x is initialized randomly and optimized iteratively until the relative change between successive iterations is under 0.5%. The maximum number of iterations is set as 1000. For the method with Gaussian kernel MMD, the kernel bandwidth σ2 is fixed as the mean of squared l2 distances of the sampled pairs since it does not affect a lot on the visual results. Our implementation is based on the MXNet [Chen et al., 2016] implementation1 which reproduces the results of original neural style transfer [Gatys et al., 2016]. Since the scales of the gradients of the style loss differ for different methods, and the weights α and β in Eq. 3 affect the results of style transfer, we fix some factors to make a fair comparison. Specifically, we set α = 1 because the content losses are the same among different methods. Then, for each method, we first manually select a proper β such that the gradients on the x from the style loss are of the same order of magnitudes as those from the content loss. Thus, we can manipulate a balance factor γ (β = γβ ) to make trade-off between the content and style matching. |