Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents
Authors: Wenlong Huang, Fei Xia, Dhruv Shah, Danny Driess, Andy Zeng, Yao Lu, Pete Florence, Igor Mordatch, Sergey Levine, Karol Hausman, brian ichter
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate how such grounded models can be obtained across three simulation and real-world domains, and that the proposed decoding strategy is able to solve complex, long-horizon embodiment tasks in a robotic setting by leveraging the knowledge of both models. ... Our contributions are as followed: ... 3) we show empirical evidence, across three simulation and real-world domains, that the proposed method performs strongly on a wide range of tasks while also significantly outperforming prior methods in efficiency. ... 4 Experiments |
| Researcher Affiliation | Collaboration | Wenlong Huang1 , Fei Xia2, Dhruv Shah3, Danny Driess2, Andy Zeng2, Yao Lu2, Pete Florence2, Igor Mordatch2, Sergey Levine2,3, Karol Hausman2, Brian Ichter2 1Stanford University, 2Google Deepmind, 3UC Berkeley |
| Pseudocode | Yes | Algorithm 1 Grounded Decoding (GD) w/ Greedy Search |
| Open Source Code | Yes | grounded-decoding.github.io |
| Open Datasets | Yes | Herein we experiment with a simulated tabletop manipulation environment based on RAVENS [95]. We create a custom set of 20 tasks, with 10 seen tasks and 10 unseen tasks. Seen tasks are used for training (for supervised baseline) or for few-shot prompting. ... We further evaluate the long-horizon reasoning of Grounded Decoding for 2D maze-solving on Minigrid [10]. |
| Dataset Splits | No | The paper mentions 'seen tasks' for training and 'unseen tasks' for testing but does not explicitly provide details about a separate validation split or how it was used beyond 'seen tasks are used for training'. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU models, CPU specifications, or memory, for running the experiments. It generally discusses training and environments without hardware specifics. |
| Software Dependencies | No | The paper mentions several software components like Instruct GPT, PaLM, CLIPort, PPO, CLIP, and owl-vit, but it does not specify their version numbers, which is necessary for reproducibility. |
| Experiment Setup | No | The paper mentions some aspects of the experimental setup, such as training on '50,000 pre-collected demonstrations' and using 'sparse outcome reward', but it does not provide specific hyperparameters like learning rates, batch sizes, or optimizer settings, which are crucial for detailed experimental reproducibility. |