Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought

Authors: Yuki Wang, Gonzalo Gonzalez-Pumariega, Yash Sharma, Sanjiban Choudhury

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive evaluation on various robot task benchmarks, including a novel game benchmark Robotouille, designed to simulate diverse cooking tasks in a kitchen environment.
Researcher Affiliation Academia Huaxiaoyue Wang Cornell University yukiwang@cs.cornell.edu Gonzalo Gonzalez-Pumariega Cornell University gg387@cornell.edu Yash Sharma Cornell University ys749@cornell.edu Sanjiban Choudhury Cornell University sanjibanc@cornell.edu
Pseudocode Yes Algorithm 1 Demo2Code: Generating task code from language instructions and demonstrations
Open Source Code Yes The project s website is at https://portal-cornell.github.io/demo2code/ Codebase is available here: https://github.com/portal-cornell/demo2code
Open Datasets Yes We introduce a novel, open-source simulator to simulate complex, long-horizon cooking tasks for a robot, e.g. making a burger by cutting lettuces and cooking patties. Unlike existing simulators that focus on simulating physics or sensors, Robotouille focuses on high level task planning and abstracts away other details. We build on a standard backend, PDDLGym [59], with a user-friendly game as the front end to easily collect demonstrations. For the experiment, we create a set of tasks, where each is associated with a set of preferences (e.g. what a user wants in the burger, how the user wants the burger cooked). For each task and each associated preference, we procedurally generate 10 scenarios. Codebase and usage guide for Robotouille is available here: https://github.com/portal-cornell/robotouille
Dataset Splits No We evaluate the different methods across three metrics.
Hardware Specification No We use gpt-3.5-turbo-16k for all experiments with temperature 0.
Software Dependencies No We use gpt-3.5-turbo-16k for all experiments with temperature 0.
Experiment Setup Yes We use gpt-3.5-turbo-16k for all experiments with temperature 0.