reproducibilityindex.ai

Jointly Training Large Autoregressive Multimodal Models

Authors: Emanuele Aiello, LILI YU, Yixin Nie, Armen Aghajanyan, Barlas Oguz

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To achieve this objective, we conduct a comprehensive empirical investigation into the fusion of two specialized autoregressive, decoder-only, large transformer models, each designed for unique tasks (one for text-to-image and a text only model).
Researcher Affiliation	Collaboration	Politecnico di Torino, Meta AI
Pseudocode	No	The paper describes its methods in prose and includes architectural diagrams, but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing the source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	Text corpora We use 30B text tokens sampled from a mixture of several publicly available data, and we reuse the data used for training other common open-source LLM following the same preprocessing of (Touvron et al., 2023). The datasets are: English Common Crawl (Touvron et al., 2023), C4 (Raffel et al., 2020), Wikipedia, Books3 from The Pile (Gao et al., 2020), and ar Xiv.
Dataset Splits	No	The paper mentions using 'validation perplexity (PPL)' for model selection and discusses training token counts and epochs, but it does not specify the explicit training, validation, and test dataset splits (e.g., percentages or absolute sample counts) needed to reproduce the data partitioning. While MS-COCO has standard splits, the paper does not explicitly state which split they used for validation or how their custom datasets were split.
Hardware Specification	Yes	This training procedure takes approximately one day on 256 80GB A100s for all models.
Software Dependencies	No	The paper mentions specific models and tokenizers (e.g., 'VQ-VAE tokenizer', 'CM3leon') and objectives, but does not provide specific software dependency names with version numbers (e.g., 'Python 3.8', 'TensorFlow 2.x') required for reproducibility.
Experiment Setup	Yes	Our initial learning rate is lr = 3 10 5 we use 500 warm-up steps. We set our optimal batch size to 8M tokens. The total number of training steps is 5960.