Multi-Level Compositional Reasoning for Interactive Instruction Following

Authors: Suvaansh Bhambri, Byeonghwi Kim, Jonghyun Choi

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our empirical evaluations with a long horizon instruction following task with the condition of not requiring additional depth supervision and perfect egomotion assumption, usually not available for real world deployment, we observe that MCR-Agent outperforms most prior arts in literature by large margins. The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23) Experiments Dataset. Metrics. Comparison with State of the Arts Ablation Study
Researcher Affiliation Academia Yonsei University suvaanshbhambri@gmail.com, byeonghwikim@yonsei.ac.kr, jc@yonsei.ac.kr
Pseudocode No The paper describes the model architecture and components with equations but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes The code is available at https://github.com/yonseivnl/mcr-agent.
Open Datasets Yes To evaluate our approach in challenging scenarios, we focus on the problem of interactive instruction following in the ALFRED benchmark (Shridhar et al. 2020), which poses numerous challenges including long-term planning, partial observability, and irreversible state changes. The dataset is divided into three splits; train , validation , and test set.
Dataset Splits Yes The dataset is divided into three splits; train , validation , and test set. To evaluate the generalisation ability of an embodied agent to novel environments, the benchmark further divides validation and test trajectories into seen and unseen splits.
Hardware Specification No No specific hardware details (e.g., CPU/GPU models, memory specifications, or cloud instance types) used for running experiments were mentioned in the paper.
Software Dependencies No The paper mentions various models and architectures (e.g., Bi-LSTM, Res Net, pretrained object detector) but does not provide specific version numbers for any software dependencies or libraries (e.g., PyTorch 1.x, TensorFlow 2.x).
Experiment Setup No The paper describes the overall training process and some architectural decisions but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training configurations.