MultiModalQA: complex question answering over text, tables and images
Authors: Alon Talmor, Ori Yoran, Amnon Catav, Dan Lahav, Yizhong Wang, Akari Asai, Gabriel Ilharco, Hannaneh Hajishirzi, Jonathan Berant
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We create 29,918 questions through this procedure, and empirically demonstrate the necessity of a multi-modal multi-hop approach to solve our task: our multi-hop model, Implicit Decomp, achieves an average F1 of 51.7 over cross-modal questions, substantially outperforming a strong baseline that achieves 38.2 F1, but still lags significantly behind human performance, which is at 90.1 F1. |
| Researcher Affiliation | Collaboration | 1The Allen Institute for AI, 2Tel-Aviv University, 3University of Washington |
| Pseudocode | No | The paper describes the model components and their interaction in prose, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our dataset and code are available at https://allenai.github.io/multimodalqa. |
| Open Datasets | Yes | Our dataset and code are available at https://allenai.github.io/multimodalqa. |
| Dataset Splits | Yes | We split the dataset into 23,817 training, 2,441 development (dev.), and 3,660 test set examples. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions pre-trained models like RoBERTa-large and VILBERT-MT1, but does not provide specific version numbers for any software dependencies, libraries, or frameworks used for implementation or experimentation. |
| Experiment Setup | No | The paper describes the training strategy and loss functions (e.g., cross entropy loss) but does not provide specific experimental setup details such as hyperparameter values (learning rate, batch size, number of epochs), optimizer settings, or detailed training schedules. |