Pre-Trained Language Models for Interactive Decision-Making

Authors: Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that this framework enables effective combinatorial generalization across different environments and supervisory modalities. We begin by assuming access to a set of expert demonstrations, and show that initializing policies with LMs and fine-tuning them via behavior cloning improves task completion rates by 43.6% in the Virtual Home environment. Next, we integrate an active data gathering procedure in which agents iteratively interact with the environment, relabel past failed experiences with new goals, and update their policies in a self-supervised loop. Active data gathering further improves combinatorial generalization, outperforming the best baseline by 25.1%. Finally, we explain these results by investigating three possible factors underlying the effectiveness of the LM-based policy.
Researcher Affiliation Collaboration 1MIT, 2Nvidia, 3Caltech, 4Google Brain, 5UT Austin
Pseudocode No The paper describes the methods and procedures in text and with diagrams (e.g., Figure 2), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes 3. (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] In the supplemental material.
Open Datasets Yes We use Baby AI [16] and Virtual Home [31] to evaluate the proposed method. [...] We use the standard training and test data provided by [16].
Dataset Splits Yes 3. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Section 6 and Appendix C.2.
Hardware Specification Yes 3. (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] See Appendix C.2. (Appendix C.2 states: 'All experiments are conducted on 8 Nvidia V100 GPUs with 32GB memory.')
Software Dependencies No The paper mentions a software dependency: 'Our implementation is based on Stable Baselines3 [35].' (Section 6.2). However, it does not provide specific version numbers for this or any other software component used in the experiments.
Experiment Setup Yes 3. (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See Section 6 and Appendix C.2. (Section 6.1.1 states: 'Each method is trained on 20K demos from the Virtual Home-Imitation Learning dataset').