Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer

Authors: Doyup Lee, Chiheon Kim, Saehoon Kim, Minsu Cho, WOOK SHIN HAN

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, our method achieves state-of-the-art results on conditional image generation. We also validate that the Draft-and-Revise decoding can achieve high performance by effectively controlling the quality-diversity trade-off in image generation.
Researcher Affiliation Collaboration Doyup Lee Kakao Brain doyup.lee@kakaobrain.com Chiheon Kim Kakao Brain chiheon.kim@kakaobrain.com Saehoon Kim Kakao Brain shkim@kakaobrain.com Minsu Cho POSTECH mscho@postech.ac.kr Wook-Shin Han POSTECH wshan@dblab.postech.ac.kr
Pseudocode Yes Algorithm 1 UPDATE of S Algorithm 2 Draft-and-Revise decoding
Open Source Code No The paper does not provide an unambiguous statement of code release or a direct link to a source code repository for the methodology described.
Open Datasets Yes We train Contextual RQ-Transformer with 333M, 821M, and 1.4B parameters on Image Net [7] for class-conditional image generation. ... We train our model with 333M and 654M parameters on CC-3M [32] for text-to-image (T2I) generation...
Dataset Splits No While Table 1 includes a row for 'Validation Data', the paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning for training, validation, and testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup Yes For Draft-and-Revise decoding, we use Tdraft = 64, Trevise = 2, and M = 2. We use temperature scaling [19] and classifier-free guidance [20] only in the revise phase, while none of the strategies are applied in the draft phase.