Controllable Text-to-Image Generation
Authors: Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip Torr
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on benchmark datasets demonstrate that our method outperforms existing state of the art, and is able to effectively manipulate synthetic images using natural language descriptions. |
| Researcher Affiliation | Academia | Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip H. S. Torr University of Oxford {bowen.li, thomas.lukasiewicz}@cs.ox.ac.uk {xiaojuan.qi, philip.torr}@eng.ox.ac.uk |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/mrlibw/Control GAN. |
| Open Datasets | Yes | Our method is evaluated on the CUB bird [23] and the MS COCO [10] datasets. |
| Dataset Splits | Yes | The CUB dataset contains 8,855 training images and 2,933 test images... As for the COCO dataset, it contains 82,783 training images and 40,504 validation images... |
| Hardware Specification | No | The paper does not specify the hardware used for training or experimentation (e.g., GPU models, CPU types, memory). It mentions software components and training parameters but no specific hardware specifications. |
| Software Dependencies | No | The paper mentions models (VGG-16, LSTM) and optimizers (Adam) but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python version, PyTorch/TensorFlow version, CUDA version). |
| Experiment Setup | Yes | There are three stages (K = 3) in our Control GAN generator following [25]. The three scales are 64x64, 128x128, and 256x256... The text encoder is a pre-trained bidirectional LSTM... The whole network is trained using the Adam optimiser [8] with the learning rate 0.0002. The hyper-parameters λ1, λ2, λ3, and λ4 are set to 0.5, 1, 1, and 5 for both datasets, respectively. |