VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training

Authors: Yecheng Jason Ma, Shagun Sodhani, Dinesh Jayaraman, Osbert Bastani, Vikash Kumar, Amy Zhang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Trained on large-scale Ego4D human videos and without any fine-tuning on in-domain, task-specific data, VIP can provide dense visual reward for an extensive set of simulated and real-robot tasks, enabling diverse reward-based visual control methods and outperforming all prior pre-trained representations.
Researcher Affiliation Collaboration FAIR, Meta AI1, University of Pennsylvania2
Pseudocode Yes Algorithm 1 Value-Implicit Pre-Training (VIP) ... Algorithm 2 VIP Py Torch Pseudocode
Open Source Code Yes We have open-sourced code for using our pre-trained VIP model and training a new VIP model using any custom video dataset at https://github.com/facebookresearch/vip; the instruction for model training and inference is included in the README.md file in the supplementary file, and the hyperparameters are already configured.
Open Datasets Yes Trained on the large-scale, in-the-wild Ego4D human video dataset (Grauman et al., 2022) using a simple sparse reward... We also consider a self-supervised Res Net50 network trained on Image Net (Deng et al., 2009) using Momentum Contrastive (Mo Co), a supervised Res Net50 network trained on Image Net
Dataset Splits No The paper does not explicitly state specific train/validation/test dataset splits (e.g., percentages or absolute counts) for the Ego4D dataset or other datasets used directly for VIP training or evaluation, beyond mentioning 'training set' and 'test rollouts'.
Hardware Specification No The paper mentions running experiments on 'real 7-DOF Franka robot' and refers to 'simulator' environments, but does not provide specific details about the CPU, GPU, or memory used for training or inference.
Software Dependencies No The paper mentions 'Py Torch (Paszke et al., 2019)' for the pseudocode and 'Adam (Kingma & Ba, 2014)' as the optimizer, but does not specify version numbers for other ancillary software like Python, CUDA, or specific libraries beyond their citations.
Experiment Setup Yes Additionally, we use the exact same hyperparameters (e.g., batch size, optimizer, learning rate) as in Nair et al. (2022). See App. D for details. ... Table 2: VIP Architecture & Hyperparameters. Optimizer Adam (Kingma & Ba, 2014), Learning rate 0.0001, L1 weight penalty 0.001, Mini-batch size 32, Discount factor γ 0.98