reproducibilityindex.ai

Video Prediction Models as Rewards for Reinforcement Learning

Authors: Alejandro Escontrela, Ademi Adeniji, Wilson Yan, Ajay Jain, Xue Bin Peng, Ken Goldberg, Youngwoon Lee, Danijar Hafner, Pieter Abbeel

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform an extensive evaluation, and show that VIPER can achieve expert-level control without task rewards on 15 DMC tasks [44], 6 RLBench tasks [20], and 7 Atari tasks [3] (see examples in Figure 2 and Appendix A.5).
Researcher Affiliation	Academia	University of California, Berkeley
Pseudocode	Yes	Algorithm 1: VIPER Train video prediction model pθ on expert videos.
Open Source Code	Yes	Source code and datasets are available on the project website: https://escontrela.me/viper
Open Datasets	Yes	We utilize 15 tasks from the Deep Mind Control (DMC) suite [44], 7 tasks from the Atari Gym suite [4], and 6 tasks from the Robot Learning Benchmark (RLBench) [20].
Dataset Splits	No	The paper does not explicitly define specific training, validation, and test dataset splits with percentages, sample counts, or citations to predefined splits for its models. It describes data collection for training video prediction models and then training RL agents within environments.
Hardware Specification	Yes	All models are trained on TPUv3-8 instances which are approximately similar to 4 Nvidia V100 GPUs. DMC agents were trained using 1 Nvidia V100 GPU, while Atari and RLBench agents were trained using 1 Nvidia A100 GPU.
Software Dependencies	No	The paper mentions various models and algorithms used (e.g., Dr Q, Dreamer V3, VQ-GAN, Video GPT) but does not specify software dependencies like programming languages, libraries, or frameworks with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	Further details on model architecture and hyperparameters can be found in Appendix A.2. All models are trained on TPUv3-8 instances which are approximately similar to 4 Nvidia V100 GPUs. Appendix A.2 (Table 1 and 2) provides hyperparameter details such as Input size, Latent size, Batch size, Learning rate, Training steps, and model dimensions. Appendix A.3 (Table 3 and 4) provides hyperparameters and training details for Dreamer V3 and Dr Q including Replay Capacity, Batch Size, Batch length, MLP Size, Learning Rate, and various other algorithm-specific settings.