reproducibilityindex.ai

Reject Decoding via Language-Vision Models for Text-to-Image Synthesis

Authors: Fuxiang Wu, Liu Liu, Fusheng Hao, Fengxiang He, Lei Wang, Jun Cheng

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments conducted on the MS-COCO dataset and large-scale datasets show that the proposed reject decoding algorithm can exclude the useless paths and enlarge the searching paths to improve the synthesizing quality by consuming less time. We conduct extensive experiments with a base model trained on the MS-COCO dataset and a large-scale model trained on large-scale datasets to verify the efﬁciency of the reject decoding algorithm and the effectiveness of the multimodal vision models.
Researcher Affiliation	Collaboration	Fuxiang Wu1,2, Liu Liu3, Fusheng Hao1,2, Fengxiang He4, Lei Wang1, 2, Jun Cheng 1,2 1 Guangdong Provincial Key Laboratory of Robotics and Intelligent System, Shenzhen Institute of Advanced Technology, CAS, China 2 The Chinese University of Hong Kong, Hong Kong, China 3 School of Computer Science, Faculty of Engineering, The University of Sydney, Australia 4 JD Explore Academy, JD.com Inc., Beijing, China
Pseudocode	Yes	Algorithm 1: Original Decoding in Transformer Algorithm 2: Reject Decoding in Transformer Algorithm 3: Searching Reject Threshold
Open Source Code	No	The paper does not provide a direct link or explicit statement about the availability of the authors' source code for their proposed methodology.
Open Datasets	Yes	We conduct extensive experiments with a base model trained on the MS-COCO dataset (Lin et al. 2014) as the normal model denoted by the superscript coco. To verify the experiments on large-scale datasets, we exploit the large-scale pre-trained RQ-Transformer with 3.9B parameters2 denoted by the superscript pre , which is trained by CC-3M3, CC-12M4, and YFCC-subset5. 2github.com/kakaobrain/rq-vae-transformer 3github.com/google-research-datasets/conceptual-captions 4github.com/google-research-datasets/conceptual-12m 5github.com/openai/CLIP/blob/main/data/yfcc100m.md
Dataset Splits	No	The paper mentions training on datasets but does not provide specific details on how the data was split into training, validation, and test sets, nor does it explicitly mention a validation set.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. It only mentions 'the same device' when discussing consuming time.
Software Dependencies	No	The paper mentions using GPT2 and RQ-Transformer but does not specify any software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or solvers with their specific versions).
Experiment Setup	Yes	In Algorithm 2, we choose the group size M = 8, and the total size of tokens is 64. Thus, we construct 8 similarity models as {Mi}8 i=1. In Figure 4, the results demonstrate the inﬂuence of Ne with Nb = 20. The multimodal vision models, consisted of 8 layers with 4 heads.