Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DreamLLM: Synergistic Multimodal Comprehension and Creation
Authors: Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, Hongyu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li Yi
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments highlight DREAMLLM s superior performance as a zero-shot multimodal generalist, reaping from the enhanced learning synergy. Project page: dreamllm.github.io. 4 EXPERIMENTS DREAMLLM is a versatile multimodal generalist that excels at zero-shot or in-context visionlanguage comprehension and synthesis tasks. In this section, we conduct systematic evaluations for demonstration. |
| Researcher Affiliation | Collaboration | Runpei Dong 12 Chunrui Han 3 Yuang Peng 4 Zekun Qi 12 Zheng Ge 3 Jinrong Yang 5 Liang Zhao 3 Jianjian Sun 3 Hongyu Zhou 3 Haoran Wei 3 Xiangwen Kong 3 Xiangyu Zhang 3 Kaisheng Ma 4 Li Yi 467 1Xi an Jiaotong University 2Institute for Interdisciplinary Information Core Technology (IIISCT) 3MEGVII Technology 4Tsinghua University 5HUST 6Shanghai Artificial Intelligence Laboratory 7Shanghai Qi Zhi Institute |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project page: dreamllm.github.io. |
| Open Datasets | Yes | The training data are constructed based on the following datasets: a) LAION400M (Schuhmann et al., 2021), b) LAION-COCO (Schuhmann et al., 2023), c) MMC4 (Zhu et al., 2023b), d) BLIP-LAION (Li et al., 2022)... |
| Dataset Splits | Yes | The MS-COCO dataset primarily contains high-level image abstractions with shorter captions, whereas LN-COCO provides more comprehensive image descriptions (Yu et al., 2022b). DREAMLLM samples 8 images per text prompt on MSCOCO by CLIP score ranking, following previous works (Ramesh et al., 2022). On LN-COCO, DREAMLLM samples one image per prompt without CLIP ranking since the text is too long and exceeds the CLIP length limit. |
| Hardware Specification | Yes | GPU Device 128 NVIDIA A800 |
| Software Dependencies | No | We use LLa MA-1 (Touvron et al., 2023a) trained on Share GPT (Zheng et al., 2023) as as the default LLM (i.e., Vicuna-7B1 (Chiang et al., 2023)) following Liu et al. (2023c) to endow its instruction-following capacity. During training, we use Flash Attention (Dao et al., 2022) and Py Torch FSDP (Zhao et al., 2023b) to accelerate training efficiency. |
| Experiment Setup | Yes | Training Hyper-Parameters Optimizer Adam W Learning Rate 2e-3 Weight Decay 0.0 Training Epochs 1 Warmup Ratio 0.003 Learning Rate Scheduler Cosine Batch Size Per GPU 8 Maximum Token Length 2048 |