MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Authors: Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng Gao
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | With MATHVISTA, we have conducted a comprehensive, quantitative evaluation of 12 prominent foundation models. |
| Researcher Affiliation | Collaboration | 1UCLA, 2University of Washington, 3Microsoft Research, Redmond |
| Pseudocode | Yes | Figure 6: Two examples from GPT-4. GPT-4 depends on the qualities of the generated caption and detected OCR texts. In (b), some information is incorrect, even though the final answer is correct. (a) Correct answer and code |
| Open Source Code | No | The paper provides a project website (https://mathvista.github.io) but does not contain an explicit, unambiguous statement that the source code for the methodology is openly available or a direct link to a code repository for their work. |
| Open Datasets | Yes | We collected nine Math QA datasets in multimodal settings, including four for GPS, two for MWP with visual contexts of synthetic scenes, abstract diagrams, and tables, and two for TQA on college curricula (see C.4)...We reviewed more than 70 datasets, collecting 19 of them that contain math-related instances and are publicly available, as listed in C.4. |
| Dataset Splits | Yes | MATHVISTA consists of 6,141 examples, divided into two subsets: testmini and test. testmini contains 1,000 examples, intended for model development validation or for those with limited comput-ing resources. |
| Hardware Specification | No | The paper mentions specific models like GPT-4V and Bard, which are commercial products, but it does not provide specific hardware details (e.g., GPU models, CPU types, memory) used for their experiments. |
| Software Dependencies | No | The paper mentions software components and models like 'Easy OCR (Jaided AI, 2020)' and 'Chat GPT (Open AI, 2022)' but does not provide specific version numbers for these software dependencies or libraries. |
| Experiment Setup | Yes | We provide the prompts for LLMs and the hyperparameters used for LMMs in F. |