Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
UFC-BERT: Unifying Multi-Modal Controls for Conditional Image Synthesis
Authors: Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li, Ming Ding, Jie Tang, Jingren Zhou, Hongxia Yang
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on a newly collected large-scale clothing dataset M2C-Fashion and a facial dataset Multi Modal Celeb A-HQ verify that UFC-BERT can synthesize high-fidelity images that comply with flexible multi-modal controls. |
| Researcher Affiliation | Collaboration | Zhu Zhang , Jianxin Ma , Chang Zhou , Rui Men , Zhikang Li , Ming Ding , Jie Tang , Jingren Zhou , and Hongxia Yang DAMO Academy, Alibaba Group, Tsinghua University {zhangzhu950310}@gmail.com EMAIL |
| Pseudocode | No | The paper describes the algorithms like Mask-Predict and Progressive Non-Autoregressive Generation (PNAG) in detail but does not provide them in structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We additionally use another high-resolution facial dataset Multi-Modal Celeb A-HQ [28, 61]. |
| Dataset Splits | No | The paper mentions using two datasets (M2C-Fashion and Multi-Modal Celeb A-HQ) but does not provide specific details on how these datasets were split into training, validation, and test sets for reproducibility. |
| Hardware Specification | Yes | We evaluate speed on the same V100 GPU. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers (e.g., Python, PyTorch, TensorFlow versions) used for its implementation or experiments. |
| Experiment Setup | Yes | For the BERT model, we set the number of layers, hidden size, and the number of attention heads to 24, 1024, and 16, respectively. Our UFC-BERT has 307M parameters, same as the Transformer used by VQGAN. As for hyper-parameters of PNAG, we set the parallel decoding number B to 5 and the balance coefficient σ to 0.5. We set the initial mask ratio α, the minimum mask ratio β, and the maximum iteration number T to 0.8, 0.2, and 10, respectively. |