reproducibilityindex.ai

Demo2Code: From Summarizing Demonstrations to Synthesizing Code via Extended Chain-of-Thought

Authors: Yuki Wang, Gonzalo Gonzalez-Pumariega, Yash Sharma, Sanjiban Choudhury

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive evaluation on various robot task benchmarks, including a novel game benchmark Robotouille, designed to simulate diverse cooking tasks in a kitchen environment.
Researcher Affiliation	Academia	Huaxiaoyue Wang Cornell University yukiwang@cs.cornell.edu Gonzalo Gonzalez-Pumariega Cornell University gg387@cornell.edu Yash Sharma Cornell University ys749@cornell.edu Sanjiban Choudhury Cornell University sanjibanc@cornell.edu
Pseudocode	Yes	Algorithm 1 Demo2Code: Generating task code from language instructions and demonstrations
Open Source Code	Yes	The project s website is at https://portal-cornell.github.io/demo2code/ Codebase is available here: https://github.com/portal-cornell/demo2code
Open Datasets	Yes	We introduce a novel, open-source simulator to simulate complex, long-horizon cooking tasks for a robot, e.g. making a burger by cutting lettuces and cooking patties. Unlike existing simulators that focus on simulating physics or sensors, Robotouille focuses on high level task planning and abstracts away other details. We build on a standard backend, PDDLGym [59], with a user-friendly game as the front end to easily collect demonstrations. For the experiment, we create a set of tasks, where each is associated with a set of preferences (e.g. what a user wants in the burger, how the user wants the burger cooked). For each task and each associated preference, we procedurally generate 10 scenarios. Codebase and usage guide for Robotouille is available here: https://github.com/portal-cornell/robotouille
Dataset Splits	No	We evaluate the different methods across three metrics.
Hardware Specification	No	We use gpt-3.5-turbo-16k for all experiments with temperature 0.
Software Dependencies	No	We use gpt-3.5-turbo-16k for all experiments with temperature 0.
Experiment Setup	Yes	We use gpt-3.5-turbo-16k for all experiments with temperature 0.