reproducibilityindex.ai

Position: LLMs Can’t Plan, But Can Help Planning in LLM-Modulo Frameworks

Authors: Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Mudit Verma, Kaya Stechly, Siddhant Bhambri, Lucas Paul Saldyt, Anil B Murthy

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Table 1 shows that all the state of the art LLMs show dismal performance on Plan Bench (Valmeekam et al., 2023b). and For the travel planning case study... Our preliminary results show (see Figure 5; additional results in (Gundawar et al., 2024)) that LLM-Modulo based agentification with automated critics in the loop significantly improves the performance (6x of baselines) even with a limit of 10 back prompting cycles, and weaker models such as GPT-3.5turbo.
Researcher Affiliation	Academia	1School of Computing and AI, Arizona State University, Tempe, AZ, USA.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the LLM-Modulo framework described in this paper. It refers to previous works and benchmarks but no direct code release statement or link for the current methodology.
Open Datasets	Yes	Table 1 shows that all the state of the art LLMs show dismal performance on Plan Bench (Valmeekam et al., 2023b). and For the travel planning case study, we used a benchmark proposed in (Xie et al., 2024).
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions tools and models like 'VAL', 'GPT-4', and 'GPT-3.5-turbo', but does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers).
Experiment Setup	No	The paper describes the LLM-Modulo framework and its application in case studies, but it does not contain specific experimental setup details such as concrete hyperparameter values, training configurations, or system-level settings in the main text.