Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations

Authors: Xin Liu, Haoran Li, Dongbin Zhao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on a set of challenging visual control tasks, including 16 discrete control tasks from the Procgen benchmark [30] and 12 continuous control tasks from the Deepmind Control suite (DMControl) [31] and Metaworld [32].
Researcher Affiliation Academia Xin Liu1,2, Haoran Li1,2, , Dongbin Zhao1,2 1State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences EMAIL
Pseudocode Yes We provide a diagram in Figure 2 and pseudocode in Appendix A. ... A.1 Pseudo Code Algorithm 1 The pseudo code of the proposed BCV-LR.
Open Source Code Yes We provide the implementation of BCV-LR at https://github.com/liuxin0824/BCV-LR.
Open Datasets Yes We conduct extensive experiments on a set of challenging visual control tasks, including 16 discrete control tasks from the Procgen benchmark [30] and 12 continuous control tasks from the Deepmind Control suite (DMControl) [31] and Metaworld [32]. The expert video dataset containing 8M steps is generated by well-trained RL agents, provided by [21].
Dataset Splits No The paper mentions using an "expert video dataset containing 8M steps" and interacting with environments for a limited number of "environmental steps" (e.g., 100k, 50k, 20k), but it does not specify explicit training/test/validation splits for these datasets or how they are partitioned in a formal manner beyond the online learning context.
Hardware Specification Yes The experiments of BCV-LR are conducted using V100 or A800 GPUs, and the complete workflow for each task can be finished within five hours on a single GPU.
Software Dependencies No The paper provides detailed hyper-parameter tables but does not explicitly list specific software dependencies (e.g., programming language versions, library names with version numbers like PyTorch, TensorFlow, or scikit-learn).
Experiment Setup Yes Appendix C.2 provides detailed hyper-parameter settings in Table 11 ("Default hyper-parameter settings in discrete Procgen") and Table 12 ("Default hyper-parameter settings in DMControl").