Video Prediction Models as Rewards for Reinforcement Learning

Authors: Alejandro Escontrela, Ademi Adeniji, Wilson Yan, Ajay Jain, Xue Bin Peng, Ken Goldberg, Youngwoon Lee, Danijar Hafner, Pieter Abbeel

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform an extensive evaluation, and show that VIPER can achieve expert-level control without task rewards on 15 DMC tasks [44], 6 RLBench tasks [20], and 7 Atari tasks [3] (see examples in Figure 2 and Appendix A.5).
Researcher Affiliation Academia University of California, Berkeley
Pseudocode Yes Algorithm 1: VIPER Train video prediction model pθ on expert videos.
Open Source Code Yes Source code and datasets are available on the project website: https://escontrela.me/viper
Open Datasets Yes We utilize 15 tasks from the Deep Mind Control (DMC) suite [44], 7 tasks from the Atari Gym suite [4], and 6 tasks from the Robot Learning Benchmark (RLBench) [20].
Dataset Splits No The paper does not explicitly define specific training, validation, and test dataset splits with percentages, sample counts, or citations to predefined splits for its models. It describes data collection for training video prediction models and then training RL agents within environments.
Hardware Specification Yes All models are trained on TPUv3-8 instances which are approximately similar to 4 Nvidia V100 GPUs. DMC agents were trained using 1 Nvidia V100 GPU, while Atari and RLBench agents were trained using 1 Nvidia A100 GPU.
Software Dependencies No The paper mentions various models and algorithms used (e.g., Dr Q, Dreamer V3, VQ-GAN, Video GPT) but does not specify software dependencies like programming languages, libraries, or frameworks with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes Further details on model architecture and hyperparameters can be found in Appendix A.2. All models are trained on TPUv3-8 instances which are approximately similar to 4 Nvidia V100 GPUs. Appendix A.2 (Table 1 and 2) provides hyperparameter details such as Input size, Latent size, Batch size, Learning rate, Training steps, and model dimensions. Appendix A.3 (Table 3 and 4) provides hyperparameters and training details for Dreamer V3 and Dr Q including Replay Capacity, Batch Size, Batch length, MLP Size, Learning Rate, and various other algorithm-specific settings.