FILM: Following Instructions in Language with Modular Methods
Authors: So Yeon Min, Devendra Singh Chaplot, Pradeep Kumar Ravikumar, Yonatan Bisk, Ruslan Salakhutdinov
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We explain the metrics, evaluation splits, and baselines against which FILM is compared. Furthermore, we describe training details of each of the learned components of FILM. Metrics Success Rate (SR) is a binary indicator of whether all subtasks were completed. |
| Researcher Affiliation | Collaboration | 1 Carnegie Mellon University 2 Facebook AI Research {soyeonm, pradeepr, ybisk, rsalakhu}@cs.cmu.edu dchaplot@fb.com |
| Pseudocode | Yes | Below, we present a pseudocode for the deterministic navigation/ interaction policy. We first present explanations of some terms. ... Algorithm 1 Navigation/ interaction algorithm in an episode |
| Open Source Code | Yes | 2Project webpage with code and pre-trained models: https://soyeonm.github.io/FILM webpage/ ... Project webpage with code, pre-trained models, and protocols to reproduce results is released here: https://soyeonm.github.io/FILM webpage/. |
| Open Datasets | Yes | On the ALFRED (Shridhar et al., 2020) benchmark, FILM achieves State-of-the-Art performance (24.46%) with a large margin (8% absolute) from the previous SOTA (Blukis et al., 2021). |
| Dataset Splits | Yes | The test set consists of Tests Seen (1533 epsiodes) and Tests unseen (1529 episodes); the scenes of the latter entirely consist of rooms that do not appear in the training set, while those of the former only consist of scenes seen during training. Similarly, the validation set is partitioned into Valid Seen (820 epsiodes) and Valid Unseen (821 epsiodes). |
| Hardware Specification | No | The paper states that all learned models were trained using Ai2Thor (Kolve et al., 2019) but does not provide specific details about the hardware (e.g., GPU model, CPU, memory) used for training or experimentation. |
| Software Dependencies | No | In the LP module, BERT type classification and argument classification were trained with Adam W from the Transformer (Wolf et al., 2019) package; learning rates are 1e-6 for type classification and {1e-4,1e-5,5e-5,5e-5} for each of object , parent , recep , sliced argument classification. ... Mask RCNN (He et al., 2017) (and its implementation by Shridhar et al. (2021)) and the depth prediction method of Blukis et al. (2021). ... Fine-tuning a pre-trained BART (Lewis et al., 2020) model... While it mentions several software packages and models, it does not provide specific version numbers for these software dependencies (e.g., PyTorch version, specific Transformer library version, or Python version). |
| Experiment Setup | Yes | Training Details of Learned Components In the LP module, BERT type classification and argument classification were trained with Adam W from the Transformer (Wolf et al., 2019) package; learning rates are 1e-6 for type classification and {1e-4,1e-5,5e-5,5e-5} for each of object , parent , recep , sliced argument classification. In the Semantic Mapping module, separate depth models for camera horizons of 45 and 0 were fine-tuned from an existing model of HLSM (Blukis et al., 2021), both with learning rate 1e-3 and the Adam W optimizer (epsilon 1e-6, weight decay 1e-2). Similarly, separate instance segmentation models for small and large objects were fine-tuned, starting from their respective parameters released by Shridhar et al. (2021), with learning rate 1e-3 and the SGD optimizer (momentum 0.9, weight decay 5e-4). Finally, the semantic search policy was trained with learning rate 5e-4 and the Adam W optimizer (epsilon 1e-6). |