Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Controllable Text-to-Image Generation

Authors: Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip Torr

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on benchmark datasets demonstrate that our method outperforms existing state of the art, and is able to effectively manipulate synthetic images using natural language descriptions.
Researcher Affiliation Academia Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip H. S. Torr University of Oxford EMAIL EMAIL
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/mrlibw/Control GAN.
Open Datasets Yes Our method is evaluated on the CUB bird [23] and the MS COCO [10] datasets.
Dataset Splits Yes The CUB dataset contains 8,855 training images and 2,933 test images... As for the COCO dataset, it contains 82,783 training images and 40,504 validation images...
Hardware Specification No The paper does not specify the hardware used for training or experimentation (e.g., GPU models, CPU types, memory). It mentions software components and training parameters but no specific hardware specifications.
Software Dependencies No The paper mentions models (VGG-16, LSTM) and optimizers (Adam) but does not provide specific version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python version, PyTorch/TensorFlow version, CUDA version).
Experiment Setup Yes There are three stages (K = 3) in our Control GAN generator following [25]. The three scales are 64x64, 128x128, and 256x256... The text encoder is a pre-trained bidirectional LSTM... The whole network is trained using the Adam optimiser [8] with the learning rate 0.0002. The hyper-parameters λ1, λ2, λ3, and λ4 are set to 0.5, 1, 1, and 5 for both datasets, respectively.