DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

Authors: Jianxiong Li, Jinliang Zheng, Yinan Zheng, Liyuan Mao, Xiao Hu, Sijie Cheng, Haoyi Niu, Jihao Liu, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Xianyuan Zhan

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluation on both simulated and real robots demonstrates that Decision NCE effectively facilitates diverse downstream policy learning tasks, offering a versatile solution for unified representation and reward learning.
Researcher Affiliation Collaboration 1AIR, Tsinghua University 2Sense Time Research 3Shanghai Jiaotong University 4CUHK MMLab 5Shanghai AI Lab.
Pseudocode Yes Algorithm 1 Decision NCE-P/T
Open Source Code Yes Project Page: https://2toinf.github.io/Decision NCE/
Open Datasets Yes We pretrain the Decision NCE encoders using large-scale human video dataset EPIC-KITCHEN-100 (Damen et al., 2018)
Dataset Splits No The paper mentions using 1/3/5 demonstrations for training and evaluating performance during training ('We evaluate the policy for 25 episodes per 2e3 gradient steps and report the max success rate over the training'), but does not explicitly state specific training/validation splits or percentages for its datasets.
Hardware Specification Yes Therefore, our training only take about 9 hours on four A100 GPUs, showing higher training efficiency compared to previous work.
Software Dependencies No The paper mentions using a modified ResNet-50 from CLIP and a CLIP transformer for encoders, and Distil BERT as a language encoder, but does not provide specific version numbers for these or other software libraries (e.g., PyTorch, TensorFlow, etc.).
Experiment Setup Yes The training hyperparameters used during the pre-training are listed in Table 4.