Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Authors: Bowen Baker, Ilge Akkaya, Peter Zhokov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that this behavioral prior has nontrivial zeroshot capabilities and that it can be fine-tuned, with both imitation learning and reinforcement learning, to hard-exploration tasks that are impossible to learn from scratch via reinforcement learning. For many tasks our models exhibit humanlevel performance, and we are the first to report computer agents that can craft diamond tools, which can take proficient humans upwards of 20 minutes (24,000 environment actions) of gameplay to accomplish. |
| Researcher Affiliation | Collaboration | Bowen Baker bowen@openai.com Ilge Akkaya ilge@openai.com Peter Zhokhov peterz@openai.com Joost Huizinga joost@openai.com Jie Tang jietang@openai.com Adrien Ecoffet adrien@openai.com Brandon Houghton brandon@openai.com Raul Sampedro raulsamg@gmail.com Jeff Clune jclune@gmail.com Open AI University of British Columbia |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | we open source our contractor data, trained model weights, and Minecraft environment for future research into learning to act via semi-supervised imitation learning at scale. |
| Open Datasets | Yes | we open source our contractor data, trained model weights, and Minecraft environment for future research into learning to act via semi-supervised imitation learning at scale. |
| Dataset Splits | No | The paper mentions using a 'held-out validation set' and discusses 'validation loss' but does not provide specific details on the dataset split percentages or sample counts for validation. |
| Hardware Specification | Yes | Preliminary model scaling experiments suggested that our model could benefit from 30 epochs of training and that a 0.5 billion parameter model was required to stay in the efficient learning regime64 for that training duration (Appendix H shows results comparing model size and the benefit of scaling to 0.5B parameters), which took 9 days on 720 V100 GPUs. |
| Software Dependencies | No | The paper mentions various software components in its references (e.g., PyTorch, scikit-learn) but does not provide specific version numbers for the software dependencies used in their experimental setup within the main text. |
| Experiment Setup | Yes | Preliminary model scaling experiments suggested that our model could benefit from 30 epochs of training and that a 0.5 billion parameter model was required to stay in the efficient learning regime64 for that training duration... |