reproducibilityindex.ai

TEACh: Task-Driven Embodied Agents That Chat

Authors: Aishwarya Padmakumar, Jesse Thomason, Ayush Shrivastava, Patrick Lange, Anjali Narayan-Chen, Spandana Gella, Robinson Piramuthu, Gokhan Tur, Dilek Hakkani-Tur2017-2025

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We propose three benchmarks using TEACh to study embodied intelligence challenges, and we evaluate initial models abilities in dialogue understanding, language grounding, and task execution. We evaluate a baseline Follower agent for the EDH and Tf D benchmarks based on the Episodic Transformer (E.T.) model (Pashevich, Schmid, and Sun 2021) and demonstrate the difﬁculty of engineering rule-based solvers for end-to-end task completion. Table 4 summarizes our adapted E.T. model performance on the EDH and Tf D benchmarks.
Researcher Affiliation	Collaboration	Aishwarya Padmakumar* 1, Jesse Thomason* 1 2, Ayush Shrivastava3, Patrick Lange1, Anjali Narayan-Chen1, Spandana Gella1, Robinson Piramuthu1, Gokhan Tur1, Dilek-Hakkani Tur1 1 Amazon Alexa AI 2 USC Viterbi Department of Computer Science, University of Southern California 3 Department of Electrical Engineering And Computer Science, University of Michigan
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. Methods are described in prose and supported by diagrams, but no formal algorithm listings are present.
Open Source Code	Yes	We propose three benchmarks based on TEACh sessions to study the ability of learned models to achieve aspects of embodied intelligence: Execution from Dialog History (EDH), Trajectory from Dialog (Tf D) and Two-Agent Task Completion (TATC)1. We evaluate a baseline Follower agent for the EDH and Tf D benchmarks based on the Episodic Transformer (E.T.) model (Pashevich, Schmid, and Sun 2021) and demonstrate the difﬁculty of engineering rule-based solvers for end-to-end task completion. 1https://github.com/alexa/teach
Open Datasets	Yes	We introduce TEACh, a dataset of over 3,000 human human, interactive dialogues to complete household tasks in simulation. TEACh is comprised of 3,047 successful gameplay sessions, each of which can be replayed using the AI2-THOR simulator for model training, feature extraction, or model evaluation. The footnote 1 links to 'https://github.com/alexa/teach' which typically hosts the dataset and code.
Dataset Splits	Yes	Following ALFRED, we create validation and test splits in both seen and unseen environments (Table 3). Table 3: Session and EDH instances in TEACh data splits. Train 1482 (49%), Val Seen 181 ( 6%), Val Unseen 614 (20%)
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running the experiments. It does not mention any cloud computing specifications either.
Software Dependencies	No	The paper mentions using the 'Episodic Transformer (E.T.) model' and 'Mask RCNN', as well as the 'AI2-THOR simulator' and 'spaCy tokenizer'. However, it does not provide specific version numbers for these software components or any other libraries like PyTorch or TensorFlow, which are necessary for reproducibility.
Experiment Setup	No	The paper states, 'The appendix details model hyperparameters.' However, it does not include specific experimental setup details, such as concrete hyperparameter values (learning rates, batch sizes, number of epochs, optimizer settings), in the main text as required by the criteria.