reproducibilityindex.ai

UNIFIED-IO: A Unified Model for Vision, Language, and Multi-modal Tasks

Authors: Jiasen Lu, Christopher Clark, Rowan Zellers, Roozbeh Mottaghi, Aniruddha Kembhavi

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now present results for UNIFIED-IO on the GRIT benchmark (Sec 4.1), ablate training data via the GRIT ablation benchmark (Sec 4.2) and evaluate UNIFIED-IO on 16 other benchmarks in computer vision and NLP (Sec 4.3).
Researcher Affiliation	Collaboration	Allen Institute for AI, University of Washington, Seattle
Pseudocode	No	Not found. The paper includes architectural diagrams and descriptions but no explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code and demos for UNIFIED-IO are available at: unified-io.allenai.org
Open Datasets	Yes	To fully test this capability, we gather 95 vision, language, and multi-modal datasets from 62 publicly available data sources as targets for our model to learn during multi-task training.
Dataset Splits	No	Not found. The paper mentions multi-task training on a large dataset and evaluation on various benchmarks, but it does not explicitly provide details about the specific validation dataset splits used during its own training process, beyond relying on the test splits of established benchmarks.
Hardware Specification	No	Not found. The paper mentions training on various model sizes with different batch sizes and parallelization strategies but does not specify details like GPU or CPU models used for training.
Software Dependencies	No	Not found. The paper mentions using VQ-GAN and Adafactor optimizer and references their respective papers, but it does not specify software versions for these or other dependencies like Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup	Yes	We use a learning rate of 10-2 for the first 10,000 steps and then decay at a rate of 1/k. We train with β1 = 0.9 and β2 = 1.0 k 0.8, where k is the step number. We use global norm gradient clipping with 1.0 and find this is crucial to stabilized XL training. We train the Small, Base and Large with a batch size of 2048 and XL with batch size of 1024 due to memory consideration. ... For all models, we train 1000k steps 500k for pre-training and multi-task training respectively.