Embodied Multimodal Multitask Learning
Authors: Devendra Singh Chaplot, Lisa Lee, Ruslan Salakhutdinov, Devi Parikh, Dhruv Batra
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We create scenarios and datasets to quantify cross-task knowledge transfer and show that the proposed model outperforms a range of baselines in simulated 3D environments. We also show that this disentanglement of representations makes our model modular and interpretable which allows for transfer to instructions containing new concepts. ... We train all models for 10 million frames in the Easy setting and 50 million frames in the Hard setting. ... In Table 2, we report the performance of all models for both Easy and Hard settings. ... We perform several ablation tests to analyze the contribution of each component in the Dual-Attention unit: |
| Researcher Affiliation | Collaboration | Devendra Singh Chaplot1 , Lisa Lee1 , Ruslan Salakhutdinov1 , Devi Parikh2,3 , Dhruv Batra2,3 1Carnegie Mellon University 2Facebook AI Research 3Georgia Institute of Technology {chaplot, lslee, rsalakhu}@cs.cmu.edu, {parikh, dbatra}@gatech.edu |
| Pseudocode | No | The paper describes the architecture and operations in text and diagrams but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We will open-source the code for the training environment, datasets, and model implementation including all hyper-parameter details to support reproducbility and future work in this direction. |
| Open Datasets | Yes | We use the set of objects and attributes from Chaplot et al. [2018] and create a dataset which includes instructions and questions about object types, colors, relative sizes (tall/short) and superlative sizes (smallest/largest). We will open-source the code for the training environment, datasets, and model implementation including all hyper-parameter details to support reproducbility and future work in this direction. |
| Dataset Splits | No | The paper explicitly mentions 'train-test splits' and provides 'training and test sets' in Table 1, but does not specify a separate 'validation' split or its details. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions general components like 'convolutional neural network' and 'Gated Recurrent Unit (GRU)' and algorithms like 'Proximal Policy Optimization (PPO)', but does not provide specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8'). |
| Experiment Setup | Yes | We train all models for 10 million frames in the Easy setting and 50 million frames in the Hard setting. We use a +1 reward for reaching the correct object in SGN episodes and predicting the correct answer in EQA episodes. We use a small negative reward of -0.001 per time step to encourage shorter paths to the target and answering questions as soon as possible. We also use distance-based reward shaping for SGN episodes... All episodes have a maximum length of 210 time steps. We train all models with and without the auxiliary tasks using identical reward functions. ... For training the answer predictions, we use a supervised cross-entropy loss. |