Video Prediction Models as Rewards for Reinforcement Learning
Authors: Alejandro Escontrela, Ademi Adeniji, Wilson Yan, Ajay Jain, Xue Bin Peng, Ken Goldberg, Youngwoon Lee, Danijar Hafner, Pieter Abbeel
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform an extensive evaluation, and show that VIPER can achieve expert-level control without task rewards on 15 DMC tasks [44], 6 RLBench tasks [20], and 7 Atari tasks [3] (see examples in Figure 2 and Appendix A.5). |
| Researcher Affiliation | Academia | University of California, Berkeley |
| Pseudocode | Yes | Algorithm 1: VIPER Train video prediction model pθ on expert videos. |
| Open Source Code | Yes | Source code and datasets are available on the project website: https://escontrela.me/viper |
| Open Datasets | Yes | We utilize 15 tasks from the Deep Mind Control (DMC) suite [44], 7 tasks from the Atari Gym suite [4], and 6 tasks from the Robot Learning Benchmark (RLBench) [20]. |
| Dataset Splits | No | The paper does not explicitly define specific training, validation, and test dataset splits with percentages, sample counts, or citations to predefined splits for its models. It describes data collection for training video prediction models and then training RL agents within environments. |
| Hardware Specification | Yes | All models are trained on TPUv3-8 instances which are approximately similar to 4 Nvidia V100 GPUs. DMC agents were trained using 1 Nvidia V100 GPU, while Atari and RLBench agents were trained using 1 Nvidia A100 GPU. |
| Software Dependencies | No | The paper mentions various models and algorithms used (e.g., Dr Q, Dreamer V3, VQ-GAN, Video GPT) but does not specify software dependencies like programming languages, libraries, or frameworks with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | Further details on model architecture and hyperparameters can be found in Appendix A.2. All models are trained on TPUv3-8 instances which are approximately similar to 4 Nvidia V100 GPUs. Appendix A.2 (Table 1 and 2) provides hyperparameter details such as Input size, Latent size, Batch size, Learning rate, Training steps, and model dimensions. Appendix A.3 (Table 3 and 4) provides hyperparameters and training details for Dreamer V3 and Dr Q including Replay Capacity, Batch Size, Batch length, MLP Size, Learning Rate, and various other algorithm-specific settings. |