Locally Hierarchical Auto-Regressive Modeling for Image Generation
Authors: Tackgeun You, Saehoon Kim, Chiheon Kim, Doyup Lee, Bohyung Han
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our models, referred to as HQ-TVAE hereafter, on class and text-conditional image generation tasks. |
| Researcher Affiliation | Collaboration | Tackgeun You3,5 tackgeun.you@postech.ac.kr Saehoon Kim4 shkim@kakaobrain.com Chiheon Kim4 chiheon.kim@kakaobrain.com Doyup Lee4 doyup.lee@kakaobrain.com Bohyung Han1,2,3 bhhan@snu.ac.kr 1ECE, 2IPAI, 3AIIS, Seoul National University, Korea 4Kakao Brain, Korea 5CSE, POSTECH, Korea |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We report in both the paper and the supplementary material. |
| Open Datasets | Yes | We train our models on 1.2M images in the train split of Image Net [29] for class-conditional image generation. For text-conditional tasks, we employ 15M image-text pairs in Conceptual Caption (CC) [30] and Conceptual-12M [31]. |
| Dataset Splits | Yes | We train our models on 1.2M images in the train split of Image Net [29] for class-conditional image generation. For text-conditional tasks, we employ 15M image-text pairs in Conceptual Caption (CC) [30] and Conceptual-12M [31]. ... Table 2: Text-conditional image generation performance on the CC3M validation set. ... Table 3: Comparison of image reconstruction quality on the Image Net validation set. |
| Hardware Specification | Yes | We measure the throughput of sample generation on a single Tesla A100 GPU. |
| Software Dependencies | No | The paper describes network structures and normalization techniques but does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | The scaling factor in HQ-VAE is set to 2, i.e., r = 2, to produce the top and bottom codes of 8 8 and 16 16 resolution by default, respectively. We test three versions of HQ-Transformer by varying hyperparameters: (a) NMT = 12, and NPHT = 4 for the smallest model, (b) NMT = 24 and NPHT = 4 for the mid-size model, and (c) NMT = 42, and NPHT = 6 for our largest model. In all cases, NIET = 1 and other parameters related to the network size are fixed. |