Large Language Models as Generalizable Policies for Embodied Tasks

Authors: Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Rin Metcalf, Walter Talbott, Natalie Mackraz, R Devon Hjelm, Alexander T Toshev

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement Learning Policy (LLa RP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment. Using reinforcement learning, we train LLa RP to see and act solely through environmental interactions. We show that LLa RP is robust to complex paraphrasings of task instructions and can generalize to new tasks that require novel optimal behavior. In particular, on 1,000 unseen tasks it achieves 42% success rate, 1.7x the success rate of other common learned baselines or zero-shot applications of LLMs.
Researcher Affiliation Industry Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Walter Talbott Katherine Metcalf, Natalie Mackraz, Devon Hjelm, Alexander Toshev Apple
Pseudocode No The paper describes the architecture and training process of LLa RP but does not include a formal pseudocode block or algorithm listing.
Open Source Code Yes Video examples of LLa RP in Language Rearrangement and the code are at https://llm-rl.github.io.
Open Datasets Yes Finally, to aid the community in studying language conditioned, massively multi-task, embodied AI problems we release a novel benchmark, Language Rearrangement, consisting of 150,000 training and 1,000 testing tasks for language-conditioned rearrangement.
Dataset Splits No The paper mentions 150,000 training tasks and 1,000 testing tasks for Language Rearrangement. While it evaluates generalization, it does not explicitly detail a separate validation set split with specific counts or percentages for hyperparameter tuning or model selection.
Hardware Specification Yes Unless specified otherwise, every method is trained using a full node of 8 A100-80GB GPUs and 96 Intel(R) Xeon(R) CPUs @ 2.20GHz.
Software Dependencies No The paper mentions specific models and components like "LLa MA-7B V1", "Flan-T5-XL encoder", "GPT-3.5-Turbo", "IDEFICS", and "VC1 visual encoder". However, it does not provide version numbers for general software dependencies such as Python, PyTorch, or CUDA, which are necessary for full reproducibility.
Experiment Setup Yes Hyperparameters for all reinforcement learning based methods are summarized in Table 3.