Enhancing LLM Reasoning via Vision-Augmented Prompting
Authors: Ziyang Xiao, Dongxiang Zhang, Xiongwei Han, Xiaojin Fu, Wing Yin YU, Tao Zhong, Sai Wu, Yuan Wang, Jianwei Yin, Gang Chen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted across four versatile tasks, including solving geometry problems, Sudoku, time series prediction, and travelling salesman problem. The results validate the superiority of VAP over existing LLMs-based reasoning frameworks. |
| Researcher Affiliation | Collaboration | Ziyang Xiao1, Dongxiang Zhang14 , Xiongwei Han2, Xiaojin Fu2, Wing Yin Yu2 Tao Zhong2, Sai Wu14, Yuan Wang3, Jianwei Yin1, Gang Chen1 1 Zhejiang University 2 Huawei Noah s Ark Lab 3 School of Business, Singapore University of Social Sciences 4 Hangzhou High-Tech Zone(Binjiang) Institute of Blockchain and Data Security |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It provides structured text describing prompts and a workflow illustration but not formal pseudocode. |
| Open Source Code | Yes | We have provided the experimental code and test data to facilitate reproduction. |
| Open Datasets | Yes | We randomly sample 200 problem instances from Geometry Intersection Counting task in the BIG-bench benchmark2 [36]; We utilize the Sudoku puzzle generation program from the BIG-bench3 to create a dataset that includes 150 Sudoku puzzles; The dataset for this task is sourced from the Darts library [41], which includes a curated collection of 8 real univariate time series datasets. |
| Dataset Splits | No | The paper mentions generating problem instances and test sets for evaluation but does not specify clear training/validation/test splits, specific percentages, or how cross-validation was applied for all tasks. For LLM-based methods, it refers to problem instances as the data for evaluation rather than training. |
| Hardware Specification | No | The paper states it uses GPT-4-vision-preview as the underlying MLLM, which is an API service. In the NeurIPS checklist, it explicitly states: 'The paper does not provided resource usage in experiment because the method we proposed is LLMs-based, which solely involved API calls to these models and is not computationally intensive at all. Any computer with network communication capabilities could execute our method.' |
| Software Dependencies | No | The paper mentions software tools like Python Turtle, Matplotlib, DALLĀ·E 3, GPT-4V(ision), GPT-4, and LLaMA 3 8B, but it does not specify version numbers for these software components or libraries, which are necessary for reproducible descriptions of ancillary software. |
| Experiment Setup | Yes | The default temperature is set to 0. For methods that require sampling, such as SC, the temperature is set to 0.7. Additionally, for Co T-SC prompting... The default k is set to 10. |