TEACh: Task-Driven Embodied Agents That Chat
Authors: Aishwarya Padmakumar, Jesse Thomason, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, Dilek Hakkani-Tur2017-2025
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose three benchmarks using TEACh to study embodied intelligence challenges, and we evaluate initial models abilities in dialogue understanding, language grounding, and task execution. We evaluate a baseline Follower agent for the EDH and Tf D benchmarks based on the Episodic Transformer (E.T.) model (Pashevich, Schmid, and Sun 2021) and demonstrate the difficulty of engineering rule-based solvers for end-to-end task completion. Table 4 summarizes our adapted E.T. model performance on the EDH and Tf D benchmarks. |
| Researcher Affiliation | Collaboration | Aishwarya Padmakumar* 1, Jesse Thomason* 1 2, Ayush Shrivastava3, Patrick Lange1, Anjali Narayan-Chen1, Spandana Gella1, Robinson Piramuthu1, Gokhan Tur1, Dilek-Hakkani Tur1 1 Amazon Alexa AI 2 USC Viterbi Department of Computer Science, University of Southern California 3 Department of Electrical Engineering And Computer Science, University of Michigan |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. Methods are described in prose and supported by diagrams, but no formal algorithm listings are present. |
| Open Source Code | Yes | We propose three benchmarks based on TEACh sessions to study the ability of learned models to achieve aspects of embodied intelligence: Execution from Dialog History (EDH), Trajectory from Dialog (Tf D) and Two-Agent Task Completion (TATC)1. We evaluate a baseline Follower agent for the EDH and Tf D benchmarks based on the Episodic Transformer (E.T.) model (Pashevich, Schmid, and Sun 2021) and demonstrate the difficulty of engineering rule-based solvers for end-to-end task completion. 1https://github.com/alexa/teach |
| Open Datasets | Yes | We introduce TEACh, a dataset of over 3,000 human human, interactive dialogues to complete household tasks in simulation. TEACh is comprised of 3,047 successful gameplay sessions, each of which can be replayed using the AI2-THOR simulator for model training, feature extraction, or model evaluation. The footnote 1 links to 'https://github.com/alexa/teach' which typically hosts the dataset and code. |
| Dataset Splits | Yes | Following ALFRED, we create validation and test splits in both seen and unseen environments (Table 3). Table 3: Session and EDH instances in TEACh data splits. Train 1482 (49%), Val Seen 181 ( 6%), Val Unseen 614 (20%) |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. It does not mention any cloud computing specifications either. |
| Software Dependencies | No | The paper mentions using the 'Episodic Transformer (E.T.) model' and 'Mask RCNN', as well as the 'AI2-THOR simulator' and 'spaCy tokenizer'. However, it does not provide specific version numbers for these software components or any other libraries like PyTorch or TensorFlow, which are necessary for reproducibility. |
| Experiment Setup | No | The paper states, 'The appendix details model hyperparameters.' However, it does not include specific experimental setup details, such as concrete hyperparameter values (learning rates, batch sizes, number of epochs, optimizer settings), in the main text as required by the criteria. |