Painterly Image Harmonization by Learning from Painterly Objects

Authors: Li Niu, Junyan Cao, Yan Hong, Liqing Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the benchmark dataset demonstrate the effectiveness of our proposed method.
Researcher Affiliation Academia Mo E Key Lab of Artificial Intelligence, Shanghai Jiao Tong University {ustcnewly, joy c1, hy2628982280, lqzhang}@sjtu.edu.cn
Pseudocode No The paper describes its method and network structure in detail but does not provide any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No 3) We will release our annotated reference images/objects, which would greatly benefit the future research of painterly image harmonization.
Open Datasets Yes Based on 57, 025 artistic paintings in the training set of Wiki Art (Nichol 2016), we use off-the-shelf object detection model (Wu et al. 2019) pretrained on COCO (Lin et al. 2014) dataset to detect objects in artistic paintings.
Dataset Splits No The paper mentions using training data from COCO and Wiki Art and refers to '100 test images' for efficiency analysis, but it does not specify the explicit percentages or counts for training, validation, and test splits within the main text.
Hardware Specification Yes Our network is implemented using Pytorch 1.10.0 and trained using Adam optimizer with learning rate of 1e 4 on ubuntu 20.04 LTS operation system, with 128GB memory, Intel(R) Xeon(R) Silver 4116 CPU, and one Ge Force RTX 3090 GPU.
Software Dependencies Yes Our network is implemented using Pytorch 1.10.0 and trained using Adam optimizer with learning rate of 1e 4 on ubuntu 20.04 LTS operation system, with 128GB memory, Intel(R) Xeon(R) Silver 4116 CPU, and one Ge Force RTX 3090 GPU.
Experiment Setup Yes Our network is implemented using Pytorch 1.10.0 and trained using Adam optimizer with learning rate of 1e 4 on ubuntu 20.04 LTS operation system, with 128GB memory, Intel(R) Xeon(R) Silver 4116 CPU, and one Ge Force RTX 3090 GPU. For the encoder and decoder structure, we follow (Cao, Hong, and Niu 2023). For P module, we use one residual block (He et al. 2016). For Ml module in the l-th encoder layer, we stack three Res MLP Layers (Touvron et al. 2023), in which the intermediate dimension is equal to the dimension of style vector in the l-th layer.