reproducibilityindex.ai

UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis

Authors: Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li, Ming Ding, Jie Tang, Jingren Zhou, Hongxia Yang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on a newly collected large-scale clothing dataset M2C-Fashion and a facial dataset Multi Modal Celeb A-HQ verify that UFC-BERT can synthesize high-ﬁdelity images that comply with ﬂexible multi-modal controls.
Researcher Affiliation	Collaboration	Zhu Zhang , Jianxin Ma , Chang Zhou , Rui Men , Zhikang Li , Ming Ding , Jie Tang , Jingren Zhou , and Hongxia Yang DAMO Academy, Alibaba Group, Tsinghua University {zhangzhu950310}@gmail.com {jason.mjx, ericzhou.zc, yang.yhx}@alibaba-inc.com
Pseudocode	No	The paper describes the algorithms like Mask-Predict and Progressive Non-Autoregressive Generation (PNAG) in detail but does not provide them in structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We additionally use another high-resolution facial dataset Multi-Modal Celeb A-HQ [28, 61].
Dataset Splits	No	The paper mentions using two datasets (M2C-Fashion and Multi-Modal Celeb A-HQ) but does not provide specific details on how these datasets were split into training, validation, and test sets for reproducibility.
Hardware Specification	Yes	We evaluate speed on the same V100 GPU.
Software Dependencies	No	The paper does not provide specific software dependencies or version numbers (e.g., Python, PyTorch, TensorFlow versions) used for its implementation or experiments.
Experiment Setup	Yes	For the BERT model, we set the number of layers, hidden size, and the number of attention heads to 24, 1024, and 16, respectively. Our UFC-BERT has 307M parameters, same as the Transformer used by VQGAN. As for hyper-parameters of PNAG, we set the parallel decoding number B to 5 and the balance coefﬁcient σ to 0.5. We set the initial mask ratio α, the minimum mask ratio β, and the maximum iteration number T to 0.8, 0.2, and 10, respectively.