reproducibilityindex.ai

Position: Video as the New Language for Real-World Decision Making

Authors: Sherry Yang, Jacob C Walker, Jack Parker-Holder, Yilun Du, Jake Bruce, Andre Barreto, Pieter Abbeel, Dale Schuurmans

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To further illustrate how video generation can have a profound impact on real-world applications, we provide an in depth analysis on recent work that utilizes video generation as task solvers, answers to questions, policies/agents, and environment simulators through techniques such as instruction tuning, in-context learning, planning, and reinforcement learning in settings such as games, robotics, self-driving, and science. Details of the models used to generate the examples can be found in Appendix A. Additional generated videos can be found in Appendix B.
Researcher Affiliation	Collaboration	1Google Deep Mind 2UC Berkeley 3MIT.
Pseudocode	No	The paper describes model architectures and processes in narrative text and references existing models but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include an unambiguous statement that the authors are releasing their code for the work described in this paper, nor does it provide a direct link to a source-code repository.
Open Datasets	Yes	We used the contractor data from Baker et al. (2022) and training a video generation model on the Open X-Embodiment dataset (Padalkar et al., 2023) and STEM data collected from Schwarzer et al. (2023).
Dataset Splits	No	The paper uses various datasets for its discussed examples but does not specify exact training, validation, and test splits (e.g., percentages, sample counts, or explicit standard splits with citations) needed to reproduce the data partitioning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, memory amounts, or detailed computer specifications used for running its experiments or generating examples.
Software Dependencies	No	The paper references various models and architectures but does not list specific software dependencies with their version numbers (e.g., PyTorch 1.9, CUDA 11.1) needed to replicate the experiment.
Experiment Setup	Yes	the lower resolution video generation model operates at resolution [24, 40], followed by two spacial super-resolution models with target resolution [48, 80] and [192, 320]. Classifier-free guidance (Ho & Salimans, 2022) was applied for text or action conditioning. ... Our Mask GIT implementation uses 8 steps with a cosine masking schedule.