reproducibilityindex.ai

MultiModalQA: complex question answering over text, tables and images

Authors: Alon Talmor, Ori Yoran, Amnon Catav, Dan Lahav, Yizhong Wang, Akari Asai, Gabriel Ilharco, Hannaneh Hajishirzi, Jonathan Berant

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We create 29,918 questions through this procedure, and empirically demonstrate the necessity of a multi-modal multi-hop approach to solve our task: our multi-hop model, Implicit Decomp, achieves an average F1 of 51.7 over cross-modal questions, substantially outperforming a strong baseline that achieves 38.2 F1, but still lags signiﬁcantly behind human performance, which is at 90.1 F1.
Researcher Affiliation	Collaboration	1The Allen Institute for AI, 2Tel-Aviv University, 3University of Washington
Pseudocode	No	The paper describes the model components and their interaction in prose, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Our dataset and code are available at https://allenai.github.io/multimodalqa.
Open Datasets	Yes	Our dataset and code are available at https://allenai.github.io/multimodalqa.
Dataset Splits	Yes	We split the dataset into 23,817 training, 2,441 development (dev.), and 3,660 test set examples.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions pre-trained models like RoBERTa-large and VILBERT-MT1, but does not provide specific version numbers for any software dependencies, libraries, or frameworks used for implementation or experimentation.
Experiment Setup	No	The paper describes the training strategy and loss functions (e.g., cross entropy loss) but does not provide specific experimental setup details such as hyperparameter values (learning rate, batch size, number of epochs), optimizer settings, or detailed training schedules.