VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training
Authors: Yecheng Jason Ma, Shagun Sodhani, Dinesh Jayaraman, Osbert Bastani, Vikash Kumar, Amy Zhang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Trained on large-scale Ego4D human videos and without any fine-tuning on in-domain, task-specific data, VIP can provide dense visual reward for an extensive set of simulated and real-robot tasks, enabling diverse reward-based visual control methods and outperforming all prior pre-trained representations. |
| Researcher Affiliation | Collaboration | FAIR, Meta AI1, University of Pennsylvania2 |
| Pseudocode | Yes | Algorithm 1 Value-Implicit Pre-Training (VIP) ... Algorithm 2 VIP Py Torch Pseudocode |
| Open Source Code | Yes | We have open-sourced code for using our pre-trained VIP model and training a new VIP model using any custom video dataset at https://github.com/facebookresearch/vip; the instruction for model training and inference is included in the README.md file in the supplementary file, and the hyperparameters are already configured. |
| Open Datasets | Yes | Trained on the large-scale, in-the-wild Ego4D human video dataset (Grauman et al., 2022) using a simple sparse reward... We also consider a self-supervised Res Net50 network trained on Image Net (Deng et al., 2009) using Momentum Contrastive (Mo Co), a supervised Res Net50 network trained on Image Net |
| Dataset Splits | No | The paper does not explicitly state specific train/validation/test dataset splits (e.g., percentages or absolute counts) for the Ego4D dataset or other datasets used directly for VIP training or evaluation, beyond mentioning 'training set' and 'test rollouts'. |
| Hardware Specification | No | The paper mentions running experiments on 'real 7-DOF Franka robot' and refers to 'simulator' environments, but does not provide specific details about the CPU, GPU, or memory used for training or inference. |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al., 2019)' for the pseudocode and 'Adam (Kingma & Ba, 2014)' as the optimizer, but does not specify version numbers for other ancillary software like Python, CUDA, or specific libraries beyond their citations. |
| Experiment Setup | Yes | Additionally, we use the exact same hyperparameters (e.g., batch size, optimizer, learning rate) as in Nair et al. (2022). See App. D for details. ... Table 2: VIP Architecture & Hyperparameters. Optimizer Adam (Kingma & Ba, 2014), Learning rate 0.0001, L1 weight penalty 0.001, Mini-batch size 32, Discount factor γ 0.98 |