UNIFIED-IO: A Unified Model for Vision, Language, and Multi-modal Tasks

Authors: Jiasen Lu, Christopher Clark, Rowan Zellers, Roozbeh Mottaghi, Aniruddha Kembhavi

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now present results for UNIFIED-IO on the GRIT benchmark (Sec 4.1), ablate training data via the GRIT ablation benchmark (Sec 4.2) and evaluate UNIFIED-IO on 16 other benchmarks in computer vision and NLP (Sec 4.3).
Researcher Affiliation Collaboration Allen Institute for AI, University of Washington, Seattle
Pseudocode No Not found. The paper includes architectural diagrams and descriptions but no explicit pseudocode or algorithm blocks.
Open Source Code Yes Code and demos for UNIFIED-IO are available at: unified-io.allenai.org
Open Datasets Yes To fully test this capability, we gather 95 vision, language, and multi-modal datasets from 62 publicly available data sources as targets for our model to learn during multi-task training.
Dataset Splits No Not found. The paper mentions multi-task training on a large dataset and evaluation on various benchmarks, but it does not explicitly provide details about the specific validation dataset splits used during its own training process, beyond relying on the test splits of established benchmarks.
Hardware Specification No Not found. The paper mentions training on various model sizes with different batch sizes and parallelization strategies but does not specify details like GPU or CPU models used for training.
Software Dependencies No Not found. The paper mentions using VQ-GAN and Adafactor optimizer and references their respective papers, but it does not specify software versions for these or other dependencies like Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup Yes We use a learning rate of 10-2 for the first 10,000 steps and then decay at a rate of 1/k. We train with β1 = 0.9 and β2 = 1.0 k 0.8, where k is the step number. We use global norm gradient clipping with 1.0 and find this is crucial to stabilized XL training. We train the Small, Base and Large with a batch size of 2048 and XL with batch size of 1024 due to memory consideration. ... For all models, we train 1000k steps 500k for pre-training and multi-task training respectively.