reproducibilityindex.ai

Locally Hierarchical Auto-Regressive Modeling for Image Generation

Authors: Tackgeun You, Saehoon Kim, Chiheon Kim, Doyup Lee, Bohyung Han

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our models, referred to as HQ-TVAE hereafter, on class and text-conditional image generation tasks.
Researcher Affiliation	Collaboration	Tackgeun You3,5 tackgeun.you@postech.ac.kr Saehoon Kim4 shkim@kakaobrain.com Chiheon Kim4 chiheon.kim@kakaobrain.com Doyup Lee4 doyup.lee@kakaobrain.com Bohyung Han1,2,3 bhhan@snu.ac.kr 1ECE, 2IPAI, 3AIIS, Seoul National University, Korea 4Kakao Brain, Korea 5CSE, POSTECH, Korea
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] We report in both the paper and the supplementary material.
Open Datasets	Yes	We train our models on 1.2M images in the train split of Image Net [29] for class-conditional image generation. For text-conditional tasks, we employ 15M image-text pairs in Conceptual Caption (CC) [30] and Conceptual-12M [31].
Dataset Splits	Yes	We train our models on 1.2M images in the train split of Image Net [29] for class-conditional image generation. For text-conditional tasks, we employ 15M image-text pairs in Conceptual Caption (CC) [30] and Conceptual-12M [31]. ... Table 2: Text-conditional image generation performance on the CC3M validation set. ... Table 3: Comparison of image reconstruction quality on the Image Net validation set.
Hardware Specification	Yes	We measure the throughput of sample generation on a single Tesla A100 GPU.
Software Dependencies	No	The paper describes network structures and normalization techniques but does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	The scaling factor in HQ-VAE is set to 2, i.e., r = 2, to produce the top and bottom codes of 8 8 and 16 16 resolution by default, respectively. We test three versions of HQ-Transformer by varying hyperparameters: (a) NMT = 12, and NPHT = 4 for the smallest model, (b) NMT = 24 and NPHT = 4 for the mid-size model, and (c) NMT = 42, and NPHT = 6 for our largest model. In all cases, NIET = 1 and other parameters related to the network size are ﬁxed.