Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language

Authors: Seonghyeon Nam, Yunji Kim, Seon Joo Kim

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our method outperforms existing methods on CUB and Oxford-102 datasets, and our results were mostly preferred on a user study.
Researcher Affiliation Academia Seonghyeon Nam, Yunji Kim, and Seon Joo Kim Yonsei University {shnnam,kim_yunji,seonjookim}@yonsei.ac.kr
Pseudocode No The paper describes the model architecture and training process in prose and with diagrams, but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets Yes We evaluated our method on CUB dataset [18] and Oxford-102 dataset [19], which are well-known public datasets.
Dataset Splits No The paper states that experiments were conducted on CUB and Oxford-102 datasets and mentions using a 'test set', but it does not provide specific details on the train, validation, and test splits (e.g., percentages or exact sample counts) or explicitly refer to a predefined standard split.
Hardware Specification No The paper mentions using PyTorch for implementation but does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using PyTorch, fastText word vectors, and Adam optimizer, but it does not specify any version numbers for these software components.
Experiment Setup Yes We trained our network 600 epochs using Adam optimizer [29] with the learning rate of 0.0002, the momentum of 0.5, and the batch size of 64. Also, we decreased the learning rate by 0.5 for every 100 epochs. For data augmentation, we used random cropping, flipping, and rotation. We resized images to 136 136 and randomly cropped 128 128 patches. The random rotation ranged from -10 to 10 degrees. We set λ1 and λ2 to 10 and 2 respectively considering both the visual quality and the training stability.