Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Authors: Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Jianfeng Gao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We showcase the effectiveness of Chameleon on two multi-modal knowledge-intensive reasoning tasks: Science QA and Tab MWP. Chameleon, powered by GPT-4, achieves an 86.54% overall accuracy on Science QA, improving the best published few-shot result by 11.37%. On Tab MWP, GPT-4-powered Chameleon improves the accuracy by 17.0%, lifting the state of the art to 98.78%. |
| Researcher Affiliation | Collaboration | Pan Lu1, Baolin Peng2, Hao Cheng2, Michel Galley2 Kai-Wei Chang1, Ying Nian Wu1, Song-Chun Zhu1, Jianfeng Gao2 1University of California, Los Angeles 2Microsoft Research, Redmond |
| Pseudocode | No | The paper describes program generation and execution but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a project website link (https://chameleon-llm.github.io) but does not explicitly state that the source code for their methodology is available at this link or elsewhere. |
| Open Datasets | Yes | We assess Chameleon s effectiveness and adaptability on two complex reasoning tasks, Science QA [32] and Tab MWP [33]. |
| Dataset Splits | No | The paper mentions using 'in-context examples as demonstrations' for LLM-based models, which is a prompting strategy, rather than defining specific training/test/validation dataset splits. |
| Hardware Specification | No | The paper mentions using |
| Software Dependencies | No | The paper mentions software components like |
| Experiment Setup | Yes | The maximum length for generated programs is set to 128, and the temperature is set to 0 for the most deterministic generation. By default, the LLM-based models use four in-context examples as demonstrations, have a temperature setting of 0, and allow a maximum of 512 tokens for completion. |