reproducibilityindex.ai

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos

Authors: Bowen Baker, Ilge Akkaya, Peter Zhokov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that this behavioral prior has nontrivial zeroshot capabilities and that it can be ﬁne-tuned, with both imitation learning and reinforcement learning, to hard-exploration tasks that are impossible to learn from scratch via reinforcement learning. For many tasks our models exhibit humanlevel performance, and we are the ﬁrst to report computer agents that can craft diamond tools, which can take proﬁcient humans upwards of 20 minutes (24,000 environment actions) of gameplay to accomplish.
Researcher Affiliation	Collaboration	Bowen Baker bowen@openai.com Ilge Akkaya ilge@openai.com Peter Zhokhov peterz@openai.com Joost Huizinga joost@openai.com Jie Tang jietang@openai.com Adrien Ecoffet adrien@openai.com Brandon Houghton brandon@openai.com Raul Sampedro raulsamg@gmail.com Jeff Clune jclune@gmail.com Open AI University of British Columbia
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	we open source our contractor data, trained model weights, and Minecraft environment for future research into learning to act via semi-supervised imitation learning at scale.
Open Datasets	Yes	we open source our contractor data, trained model weights, and Minecraft environment for future research into learning to act via semi-supervised imitation learning at scale.
Dataset Splits	No	The paper mentions using a 'held-out validation set' and discusses 'validation loss' but does not provide specific details on the dataset split percentages or sample counts for validation.
Hardware Specification	Yes	Preliminary model scaling experiments suggested that our model could beneﬁt from 30 epochs of training and that a 0.5 billion parameter model was required to stay in the efﬁcient learning regime64 for that training duration (Appendix H shows results comparing model size and the beneﬁt of scaling to 0.5B parameters), which took 9 days on 720 V100 GPUs.
Software Dependencies	No	The paper mentions various software components in its references (e.g., PyTorch, scikit-learn) but does not provide specific version numbers for the software dependencies used in their experimental setup within the main text.
Experiment Setup	Yes	Preliminary model scaling experiments suggested that our model could beneﬁt from 30 epochs of training and that a 0.5 billion parameter model was required to stay in the efﬁcient learning regime64 for that training duration...