Hierarchical Image Generation via Transformer-Based Sequential Patch Selection

Authors: Xiaogang Xu, Ning Xu2938-2945

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluated on the challenging Visual Genome and COCO-Stuff dataset, our experimental results demonstrate the superiority of our proposed method over existing state-of-the-art methods.
Researcher Affiliation Collaboration Xiaogang Xu,1 Ning Xu 2 1 Department of Computer Science and Engineering, The Chinese University of Hong Kong 2 Adobe Research xgxu@cse.cuhk.edu.hk, nxu@adobe.com
Pseudocode No The paper contains architectural diagrams but no structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include an explicit statement about releasing its source code or a link to a code repository.
Open Datasets Yes The COCO-Stuff (Caesar, Uijlings, and Ferrari 2018) and Visual Genome (Krishna et al. 2017) datasets are standard benchmark datasets for evaluating scene-graph-to-image generation models.
Dataset Splits Yes We follow the protocol in sg2im (Johnson, Gupta, and Fei-Fei 2018) to pre-process the dataset and complete the train-test split.
Hardware Specification No The paper states 'We implement with Py Torch... and train SCSM and PSGIM...', but does not provide specific details about the hardware used for training or inference, such as GPU models or CPU specifications.
Software Dependencies No The paper mentions 'Py Torch (Paszke et al. 2017)' and 'Image Net-pretrained VGG-16 network (Simonyan and Zisserman 2015)', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We implement with Py Torch (Paszke et al. 2017) and train SCSM and PSGIM with 90 epochs on both the COCO-Stuff and Visual Genome datasets. In addition, we use the Adam optimizer (Kingma and Ba 2015) with a batch size of 16. The learning rates for the generator and discriminator are both 0.0001, and the exponential decay rates (β1, β2) are set to be (0, 0.9). We set the hyper-parameters as follows: λ1 = 1.0, λ2 = 1.0, λ3 = 0.02, λ4 = 1.0. For the training of SCSM, the proportion between positive samples and negative samples is 1:10. The number of candidate crops for each object during inference is 5. The crop size is set to 64x64 and 32x32 for COCO-Stuff and Visual Genome.