Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models
Authors: Wenshan Wu, Shaoguang Mao, Yadong Zhang, Yan Xia, Li Dong, Lei Cui, Furu Wei
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrated that Vo T significantly enhances the spatial reasoning abilities of LLMs. |
| Researcher Affiliation | Collaboration | Wenshan Wu Shaoguang Mao Yadong Zhang , , Yan Xia Li Dong Lei Cui Furu Wei Microsoft Research East China Normal University |
| Pseudocode | Yes | Algorithm 1: Navigation Map Generation |
| Open Source Code | Yes | Please find the dataset and codes in our project page. |
| Open Datasets | Yes | The data and code associated with this study is publicly available and the link is provided in the paper. |
| Dataset Splits | Yes | Visual Navigation We generate 496 navigation maps and 2520 QA instances in total, covering various map sizes, up to 7x9 and 9x7. The data distribution is provided in Table 4 in appendix. |
| Hardware Specification | No | API settings are temperature 0 as greedy decoding and top p 1, with model versions of 1106-preview and vision-preview. |
| Software Dependencies | Yes | Specifically, we adopt GPT-4 [OA+23] and GPT-4 Vision [Ope23] via Azure Open AI API as they re state of the art LLM and multimodal model respectively. API settings are temperature 0 as greedy decoding and top p 1, with model versions of 1106-preview and vision-preview. |
| Experiment Setup | Yes | API settings are temperature 0 as greedy decoding and top p 1, with model versions of 1106-preview and vision-preview. For all experiments we adopt zero-shot prompting. |