Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents

Authors: wonje choi, Woo Kyung Kim, SeungHyun Kim, Honguk Woo

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments, we show that CONPE outperforms other state-of-the-art algorithms for several embodied agent tasks including navigation in AI2THOR, manipulation in egocentric-Metaworld, and autonomous driving in CARLA, while also improving the sample efficiency of policy learning and adaptation.
Researcher Affiliation Academia Wonje Choi, Woo Kyung Kim, Seung Hyun Kim, Honguk Woo Department of Computer Science and Engineering Sungkyunkwan University {wjchoi1995, kwk2696, kimsh571, hwoo}@skku.edu
Pseudocode Yes Algorithm 1 Procedure of CONPE Framework
Open Source Code No The paper states: "We create the datasets with various visual domains in AI2THOR, egocentric-Metaworld and CARLA, and make them publicly accessible for further research on RL policy adaptation." This refers to datasets, not the source code for the methodology. There is no explicit statement or link indicating that the source code for their method is open-source or publicly available.
Open Datasets Yes We create the datasets with various visual domains in AI2THOR, egocentric-Metaworld and CARLA, and make them publicly accessible for further research on RL policy adaptation.
Dataset Splits No The paper describes how domains are categorized for zero-shot evaluation (seen target domains, unseen target domains) and mentions specific numbers for these evaluation sets (30 seen, 10 unseen). However, it does not provide traditional training/validation/test splits with percentages or sample counts of a single dataset. The 'seen target domains' are for evaluation, not hyperparameter validation during training in the conventional sense.
Hardware Specification No The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU or CPU models. It mentions the use of CLIP model and frameworks like PPO and DAGGER, but not the underlying hardware.
Software Dependencies No The paper mentions several software components like CLIP model, VPT, Co Op, PPO, DAGGER, AI2THOR, Metaworld, and CARLA. However, it does not provide specific version numbers for these software dependencies, which are necessary for reproducible descriptions.
Experiment Setup Yes We implement CONPE using the CLIP model with Vi T-B/32, similar to VPT [17] and Co Op [18]. In prompt-based contrastive learning, ... the prompt length sets to be 8. In policy learning, we exploit online learning (i.e., PPO [23]) for AI2THOR and imitation learning (i.e., DAGGER [24]) for egocentric-Metaworld and CARLA. ... For prompt-based contrastive learning (in Section 3.2), we use a small dataset of expert demonstrations for each domain factor (i.e., 10 episodes per domain factor). For prompt ensemble-based policy learning (in Section 3.3), we use a few source domains randomly generated through combinatorial variations of the seen domain factors (i.e., 4 source domains).