Multi-Level Compositional Reasoning for Interactive Instruction Following
Authors: Suvaansh Bhambri, Byeonghwi Kim, Jonghyun Choi
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our empirical evaluations with a long horizon instruction following task with the condition of not requiring additional depth supervision and perfect egomotion assumption, usually not available for real world deployment, we observe that MCR-Agent outperforms most prior arts in literature by large margins. The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23) Experiments Dataset. Metrics. Comparison with State of the Arts Ablation Study |
| Researcher Affiliation | Academia | Yonsei University suvaanshbhambri@gmail.com, byeonghwikim@yonsei.ac.kr, jc@yonsei.ac.kr |
| Pseudocode | No | The paper describes the model architecture and components with equations but does not contain a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | The code is available at https://github.com/yonseivnl/mcr-agent. |
| Open Datasets | Yes | To evaluate our approach in challenging scenarios, we focus on the problem of interactive instruction following in the ALFRED benchmark (Shridhar et al. 2020), which poses numerous challenges including long-term planning, partial observability, and irreversible state changes. The dataset is divided into three splits; train , validation , and test set. |
| Dataset Splits | Yes | The dataset is divided into three splits; train , validation , and test set. To evaluate the generalisation ability of an embodied agent to novel environments, the benchmark further divides validation and test trajectories into seen and unseen splits. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, memory specifications, or cloud instance types) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | The paper mentions various models and architectures (e.g., Bi-LSTM, Res Net, pretrained object detector) but does not provide specific version numbers for any software dependencies or libraries (e.g., PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | No | The paper describes the overall training process and some architectural decisions but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training configurations. |