reproducibilityindex.ai

FILM: Following Instructions in Language with Modular Methods

Authors: So Yeon Min, Devendra Singh Chaplot, Pradeep Kumar Ravikumar, Yonatan Bisk, Ruslan Salakhutdinov

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We explain the metrics, evaluation splits, and baselines against which FILM is compared. Furthermore, we describe training details of each of the learned components of FILM. Metrics Success Rate (SR) is a binary indicator of whether all subtasks were completed.
Researcher Affiliation	Collaboration	1 Carnegie Mellon University 2 Facebook AI Research {soyeonm, pradeepr, ybisk, rsalakhu}@cs.cmu.edu dchaplot@fb.com
Pseudocode	Yes	Below, we present a pseudocode for the deterministic navigation/ interaction policy. We ﬁrst present explanations of some terms. ... Algorithm 1 Navigation/ interaction algorithm in an episode
Open Source Code	Yes	2Project webpage with code and pre-trained models: https://soyeonm.github.io/FILM webpage/ ... Project webpage with code, pre-trained models, and protocols to reproduce results is released here: https://soyeonm.github.io/FILM webpage/.
Open Datasets	Yes	On the ALFRED (Shridhar et al., 2020) benchmark, FILM achieves State-of-the-Art performance (24.46%) with a large margin (8% absolute) from the previous SOTA (Blukis et al., 2021).
Dataset Splits	Yes	The test set consists of Tests Seen (1533 epsiodes) and Tests unseen (1529 episodes); the scenes of the latter entirely consist of rooms that do not appear in the training set, while those of the former only consist of scenes seen during training. Similarly, the validation set is partitioned into Valid Seen (820 epsiodes) and Valid Unseen (821 epsiodes).
Hardware Specification	No	The paper states that all learned models were trained using Ai2Thor (Kolve et al., 2019) but does not provide specific details about the hardware (e.g., GPU model, CPU, memory) used for training or experimentation.
Software Dependencies	No	In the LP module, BERT type classiﬁcation and argument classiﬁcation were trained with Adam W from the Transformer (Wolf et al., 2019) package; learning rates are 1e-6 for type classiﬁcation and {1e-4,1e-5,5e-5,5e-5} for each of object , parent , recep , sliced argument classiﬁcation. ... Mask RCNN (He et al., 2017) (and its implementation by Shridhar et al. (2021)) and the depth prediction method of Blukis et al. (2021). ... Fine-tuning a pre-trained BART (Lewis et al., 2020) model... While it mentions several software packages and models, it does not provide specific version numbers for these software dependencies (e.g., PyTorch version, specific Transformer library version, or Python version).
Experiment Setup	Yes	Training Details of Learned Components In the LP module, BERT type classiﬁcation and argument classiﬁcation were trained with Adam W from the Transformer (Wolf et al., 2019) package; learning rates are 1e-6 for type classiﬁcation and {1e-4,1e-5,5e-5,5e-5} for each of object , parent , recep , sliced argument classiﬁcation. In the Semantic Mapping module, separate depth models for camera horizons of 45 and 0 were ﬁne-tuned from an existing model of HLSM (Blukis et al., 2021), both with learning rate 1e-3 and the Adam W optimizer (epsilon 1e-6, weight decay 1e-2). Similarly, separate instance segmentation models for small and large objects were ﬁne-tuned, starting from their respective parameters released by Shridhar et al. (2021), with learning rate 1e-3 and the SGD optimizer (momentum 0.9, weight decay 5e-4). Finally, the semantic search policy was trained with learning rate 5e-4 and the Adam W optimizer (epsilon 1e-6).