DE-net: Dynamic Text-Guided Image Editing Adversarial Networks

Authors: Ming Tao, Bing-Kun Bao, Hao Tang, Fei Wu, Longhui Wei, Qi Tian

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments In this section, we introduce the datasets, training details, and evaluation metrics used in our experiments. Then we compare the text-guided image editing performance with previous models quantitatively and qualitatively.
Researcher Affiliation Collaboration 1Nanjing University of Posts and Telecommunications 2CVL, ETH Z urich 3Huawei Inc. bingkunbao@njupt.edu.cn
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link regarding the availability of its source code.
Open Datasets Yes Datasets. We conduct experiments on two challenging datasets: CUB bird (Wah et al. 2011) and COCO (Lin et al. 2014). For the CUB bird dataset, there are 11,788 images belonging to 200 bird species, with each image corresponding to ten language descriptions. The COCO dataset contains 80k images for training and 40k images for testing. Each image corresponds to 5 language descriptions.
Dataset Splits No The paper mentions training and testing splits for COCO (80k training, 40k testing) and image counts for CUB (11,788 images) but does not specify a separate validation split or explicit percentages/counts for all splits.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions using the Adam optimizer and pre-trained networks (DAMSM, CLIP) but does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes We employ the Adam optimizer (Kingma and Ba 2015) with β1=0.0 and β2=0.9 to train our model. According to the Two Timescale Update Rule (TTUR) (Heusel et al. 2017), the learning rate is set to 0.0001 for the generator and 0.0004 for the discriminator. The hyper-parameters of the discriminator k and p are set to 2 and 6 as (Tao et al. 2022). The hyper-parameters of the generator λ1 and λ2 are set to 40 and 4 for all the datasets.