Enhancing LLM Reasoning via Vision-Augmented Prompting

Authors: Ziyang Xiao, Dongxiang Zhang, Xiongwei Han, Xiaojin Fu, Wing Yin YU, Tao Zhong, Sai Wu, Yuan Wang, Jianwei Yin, Gang Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted across four versatile tasks, including solving geometry problems, Sudoku, time series prediction, and travelling salesman problem. The results validate the superiority of VAP over existing LLMs-based reasoning frameworks.
Researcher Affiliation Collaboration Ziyang Xiao1, Dongxiang Zhang14 , Xiongwei Han2, Xiaojin Fu2, Wing Yin Yu2 Tao Zhong2, Sai Wu14, Yuan Wang3, Jianwei Yin1, Gang Chen1 1 Zhejiang University 2 Huawei Noah s Ark Lab 3 School of Business, Singapore University of Social Sciences 4 Hangzhou High-Tech Zone(Binjiang) Institute of Blockchain and Data Security
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It provides structured text describing prompts and a workflow illustration but not formal pseudocode.
Open Source Code Yes We have provided the experimental code and test data to facilitate reproduction.
Open Datasets Yes We randomly sample 200 problem instances from Geometry Intersection Counting task in the BIG-bench benchmark2 [36]; We utilize the Sudoku puzzle generation program from the BIG-bench3 to create a dataset that includes 150 Sudoku puzzles; The dataset for this task is sourced from the Darts library [41], which includes a curated collection of 8 real univariate time series datasets.
Dataset Splits No The paper mentions generating problem instances and test sets for evaluation but does not specify clear training/validation/test splits, specific percentages, or how cross-validation was applied for all tasks. For LLM-based methods, it refers to problem instances as the data for evaluation rather than training.
Hardware Specification No The paper states it uses GPT-4-vision-preview as the underlying MLLM, which is an API service. In the NeurIPS checklist, it explicitly states: 'The paper does not provided resource usage in experiment because the method we proposed is LLMs-based, which solely involved API calls to these models and is not computationally intensive at all. Any computer with network communication capabilities could execute our method.'
Software Dependencies No The paper mentions software tools like Python Turtle, Matplotlib, DALLĀ·E 3, GPT-4V(ision), GPT-4, and LLaMA 3 8B, but it does not specify version numbers for these software components or libraries, which are necessary for reproducible descriptions of ancillary software.
Experiment Setup Yes The default temperature is set to 0. For methods that require sampling, such as SC, the temperature is set to 0.7. Additionally, for Co T-SC prompting... The default k is set to 10.