Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation

Authors: Bowen Li, Xiaojuan Qi, Philip Torr, Thomas Lukasiewicz

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our model on the CUB bird [27] and more complicated COCO [17] datasets, comparing with the current state of the art, Mani GAN [15], which also focuses on text-guided image manipulation. Results for the method are reproduced using the code released by the authors. Table 1: Quantitative comparison: Fréchet inception distance (FID), accuracy, and realism of the state of the art and our method on CUB and COCO.
Researcher Affiliation Academia 1University of Oxford, 2University of Hong Kong
Pseudocode No The paper contains architectural diagrams (Figure 2) but no structured pseudocode or algorithm blocks.
Open Source Code Yes The code will be available at https://github.com/mrlibw/Lightweight-Manipulation.
Open Datasets Yes The CUB bird [27] dataset contains 8,855 training images and 2,933 test images... COCO [17] contains 82,783 training images and 40,504 validation images...
Dataset Splits Yes COCO [17] contains 82,783 training images and 40,504 validation images
Hardware Specification Yes All methods are benchmarked on a single Quadro RTX 6000 GPU.
Software Dependencies No The paper mentions software components and optimizers like Inception-v3, VGG-16, bidirectional RNN, and Adam optimiser, but it does not specify any version numbers for these or other software dependencies.
Experiment Setup Yes The scale of the output images is 256 x 256, but the size is adjustable to satisfy users preferences. Similarly to [15], there is a trade-off between the generation of new attributes matching the text description and the preservation of text-irrelevant contents of the original image. Therefore, based on the manipulative precision (MP) [15], the whole model is trained 100 epochs on CUB and 10 epochs on COCO using the Adam optimiser [12] with learning rate 0.0002. The hyperparameters λ1, λ2, λ3, and λ4 are all set to 1 for both datasets.