Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

Authors: Bowen Baker, Ilge Akkaya, Peter Zhokov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that this behavioral prior has nontrivial zeroshot capabilities and that it can be fine-tuned, with both imitation learning and reinforcement learning, to hard-exploration tasks that are impossible to learn from scratch via reinforcement learning. For many tasks our models exhibit humanlevel performance, and we are the first to report computer agents that can craft diamond tools, which can take proficient humans upwards of 20 minutes (24,000 environment actions) of gameplay to accomplish.
Researcher Affiliation Collaboration Bowen Baker bowen@openai.com Ilge Akkaya ilge@openai.com Peter Zhokhov peterz@openai.com Joost Huizinga joost@openai.com Jie Tang jietang@openai.com Adrien Ecoffet adrien@openai.com Brandon Houghton brandon@openai.com Raul Sampedro raulsamg@gmail.com Jeff Clune jclune@gmail.com Open AI University of British Columbia
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes we open source our contractor data, trained model weights, and Minecraft environment for future research into learning to act via semi-supervised imitation learning at scale.
Open Datasets Yes we open source our contractor data, trained model weights, and Minecraft environment for future research into learning to act via semi-supervised imitation learning at scale.
Dataset Splits No The paper mentions using a 'held-out validation set' and discusses 'validation loss' but does not provide specific details on the dataset split percentages or sample counts for validation.
Hardware Specification Yes Preliminary model scaling experiments suggested that our model could benefit from 30 epochs of training and that a 0.5 billion parameter model was required to stay in the efficient learning regime64 for that training duration (Appendix H shows results comparing model size and the benefit of scaling to 0.5B parameters), which took 9 days on 720 V100 GPUs.
Software Dependencies No The paper mentions various software components in its references (e.g., PyTorch, scikit-learn) but does not provide specific version numbers for the software dependencies used in their experimental setup within the main text.
Experiment Setup Yes Preliminary model scaling experiments suggested that our model could benefit from 30 epochs of training and that a 0.5 billion parameter model was required to stay in the efficient learning regime64 for that training duration...