Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents
Authors: wonje choi, Woo Kyung Kim, SeungHyun Kim, Honguk Woo
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments, we show that CONPE outperforms other state-of-the-art algorithms for several embodied agent tasks including navigation in AI2THOR, manipulation in egocentric-Metaworld, and autonomous driving in CARLA, while also improving the sample efficiency of policy learning and adaptation. |
| Researcher Affiliation | Academia | Wonje Choi, Woo Kyung Kim, Seung Hyun Kim, Honguk Woo Department of Computer Science and Engineering Sungkyunkwan University {wjchoi1995, kwk2696, kimsh571, hwoo}@skku.edu |
| Pseudocode | Yes | Algorithm 1 Procedure of CONPE Framework |
| Open Source Code | No | The paper states: "We create the datasets with various visual domains in AI2THOR, egocentric-Metaworld and CARLA, and make them publicly accessible for further research on RL policy adaptation." This refers to datasets, not the source code for the methodology. There is no explicit statement or link indicating that the source code for their method is open-source or publicly available. |
| Open Datasets | Yes | We create the datasets with various visual domains in AI2THOR, egocentric-Metaworld and CARLA, and make them publicly accessible for further research on RL policy adaptation. |
| Dataset Splits | No | The paper describes how domains are categorized for zero-shot evaluation (seen target domains, unseen target domains) and mentions specific numbers for these evaluation sets (30 seen, 10 unseen). However, it does not provide traditional training/validation/test splits with percentages or sample counts of a single dataset. The 'seen target domains' are for evaluation, not hyperparameter validation during training in the conventional sense. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU or CPU models. It mentions the use of CLIP model and frameworks like PPO and DAGGER, but not the underlying hardware. |
| Software Dependencies | No | The paper mentions several software components like CLIP model, VPT, Co Op, PPO, DAGGER, AI2THOR, Metaworld, and CARLA. However, it does not provide specific version numbers for these software dependencies, which are necessary for reproducible descriptions. |
| Experiment Setup | Yes | We implement CONPE using the CLIP model with Vi T-B/32, similar to VPT [17] and Co Op [18]. In prompt-based contrastive learning, ... the prompt length sets to be 8. In policy learning, we exploit online learning (i.e., PPO [23]) for AI2THOR and imitation learning (i.e., DAGGER [24]) for egocentric-Metaworld and CARLA. ... For prompt-based contrastive learning (in Section 3.2), we use a small dataset of expert demonstrations for each domain factor (i.e., 10 episodes per domain factor). For prompt ensemble-based policy learning (in Section 3.3), we use a few source domains randomly generated through combinatorial variations of the seen domain factors (i.e., 4 source domains). |