Controllable Text-to-Image Generation

Authors: Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip Torr

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on benchmark datasets demonstrate that our method outperforms existing state of the art, and is able to effectively manipulate synthetic images using natural language descriptions.
Researcher Affiliation Academia Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip H. S. Torr University of Oxford {bowen.li, thomas.lukasiewicz}@cs.ox.ac.uk {xiaojuan.qi, philip.torr}@eng.ox.ac.uk
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/mrlibw/Control GAN.
Open Datasets Yes Our method is evaluated on the CUB bird [23] and the MS COCO [10] datasets.
Dataset Splits Yes The CUB dataset contains 8,855 training images and 2,933 test images... As for the COCO dataset, it contains 82,783 training images and 40,504 validation images...
Hardware Specification No The paper does not specify the hardware used for training or experimentation (e.g., GPU models, CPU types, memory). It mentions software components and training parameters but no specific hardware specifications.
Software Dependencies No The paper mentions models (VGG-16, LSTM) and optimizers (Adam) but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python version, PyTorch/TensorFlow version, CUDA version).
Experiment Setup Yes There are three stages (K = 3) in our Control GAN generator following [25]. The three scales are 64x64, 128x128, and 256x256... The text encoder is a pre-trained bidirectional LSTM... The whole network is trained using the Adam optimiser [8] with the learning rate 0.0002. The hyper-parameters λ1, λ2, λ3, and λ4 are set to 0.5, 1, 1, and 5 for both datasets, respectively.