Hierarchical Image Generation via Transformer-Based Sequential Patch Selection
Authors: Xiaogang Xu, Ning Xu2938-2945
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluated on the challenging Visual Genome and COCO-Stuff dataset, our experimental results demonstrate the superiority of our proposed method over existing state-of-the-art methods. |
| Researcher Affiliation | Collaboration | Xiaogang Xu,1 Ning Xu 2 1 Department of Computer Science and Engineering, The Chinese University of Hong Kong 2 Adobe Research xgxu@cse.cuhk.edu.hk, nxu@adobe.com |
| Pseudocode | No | The paper contains architectural diagrams but no structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include an explicit statement about releasing its source code or a link to a code repository. |
| Open Datasets | Yes | The COCO-Stuff (Caesar, Uijlings, and Ferrari 2018) and Visual Genome (Krishna et al. 2017) datasets are standard benchmark datasets for evaluating scene-graph-to-image generation models. |
| Dataset Splits | Yes | We follow the protocol in sg2im (Johnson, Gupta, and Fei-Fei 2018) to pre-process the dataset and complete the train-test split. |
| Hardware Specification | No | The paper states 'We implement with Py Torch... and train SCSM and PSGIM...', but does not provide specific details about the hardware used for training or inference, such as GPU models or CPU specifications. |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al. 2017)' and 'Image Net-pretrained VGG-16 network (Simonyan and Zisserman 2015)', but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We implement with Py Torch (Paszke et al. 2017) and train SCSM and PSGIM with 90 epochs on both the COCO-Stuff and Visual Genome datasets. In addition, we use the Adam optimizer (Kingma and Ba 2015) with a batch size of 16. The learning rates for the generator and discriminator are both 0.0001, and the exponential decay rates (β1, β2) are set to be (0, 0.9). We set the hyper-parameters as follows: λ1 = 1.0, λ2 = 1.0, λ3 = 0.02, λ4 = 1.0. For the training of SCSM, the proportion between positive samples and negative samples is 1:10. The number of candidate crops for each object during inference is 5. The crop size is set to 64x64 and 32x32 for COCO-Stuff and Visual Genome. |