CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets
Authors: Lifan Yuan, Yangyi Chen, Xingyao Wang, Yi Fung, Hao Peng, Heng Ji
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on vision-language, tabular processing, and mathematical reasoning tasks show that our approach achieves substantial improvements compared to strong baselines. |
| Researcher Affiliation | Academia | Lifan Yuan , Yangyi Chen , Xingyao Wang, Yi R. Fung, Hao Peng, Heng Ji University of Illinois Urbana-Champaign {lievanyuan173}@gmail.com {yangyic3,xingyao6,yifung2,haopeng,hengji}@illinois.edu |
| Pseudocode | No | The paper includes code snippets as examples of tools, but does not present structured pseudocode or algorithm blocks for its main methodology. |
| Open Source Code | Yes | The code is available at https://github.com/lifan-yuan/CRAFT. |
| Open Datasets | Yes | We use three complex visual reasoning datasets, including GQA (Hudson & Manning, 2019), OK-VQA (Marino et al., 2019), and A-OKVQA (Schwenk et al., 2022). and We use Tab MWP (Lu et al., 2023)... and We use the algebra subset of MATH (Hendrycks et al., 2021)... and We adopt LLaVA (Liu et al., 2023a)... and COCO-2017 (Lin et al., 2014). |
| Dataset Splits | No | The paper mentions a 'validation step' for tool creation and that LATM uses 'validation samples', but it does not provide specific train/validation/test dataset split percentages or counts for its main experiments on GQA, OK-VQA, A-OKVQA, Tab MWP, or MATH. |
| Hardware Specification | No | The paper mentions the use of 'GPT-3.5-Turbo' and 'GPT-4' as backbone models and the cost of toolset construction, but it does not specify any hardware details like GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions various software libraries like Python, pandas, sympy, numpy, scipy, scikit-image, mahotas, Sim CSE, BM25, and Lizard Python library, but it does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | In this work, we empirically set the number of retrieved tools k to 10 for qt, 5 for ft, and 10 for dt. and We sample 2,000 problems from the above instruction datasets, with 1,000 being from the primary random sampling epoch and another 1,000 from the subsequent 10 epochs, each contributing 100 problems per epoch. |