reproducibilityindex.ai

Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation

Authors: Hongtao Wu, Ya Jing, Chilam Cheang, Guangzeng Chen, Jiafeng Xu, Xinghang Li, Minghuan Liu, Hang Li, Tao Kong

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive experiments on the challenging CALVIN benchmark and a real robot. On CALVIN benchmark, our method outperforms state-of-the-art baseline methods and improves the success rate from 88.9% to 94.9%.
Researcher Affiliation	Industry	Hongtao Wu , Ya Jing , Chilam Cheang, Guangzeng Chen, Jiafeng Xu, Xinghang Li, Minghuan Liu, Hang Li, Tao Kong Byte Dance Research {wuhongtao.123,kongtao}@bytedance.com
Pseudocode	No	The paper describes the model architecture and training process using text and diagrams, but it does not include any formal pseudocode or algorithm blocks.
Open Source Code	Yes	Project page: https://GR1-Manipulation.github.io
Open Datasets	Yes	The data for the large-scale video generative pre-training are sourced from the recently proposed Ego4D dataset (Grauman et al., 2022) which contains massive-scale human-object interactions. We perform experiments on the challenging CALVIN benchmark (Mees et al., 2022c).
Dataset Splits	Yes	We perform experiments on two splits of data: ABCD D and ABC D. The training dataset contains over 20k expert trajectories paired with language instruction labels. To study the data efficiency, we train on 10% data of the full training dataset from ABCD D split. Specifically, we sample 66 trajectories for each of the 34 tasks, i.e. 2244 trajectories, from the total 22,966 training trajectories.
Hardware Specification	No	In real robot experiments, we use a 7-Do F Kinova Gen2 robot mounted with a Real Sense camera on its end-effector. A Kinect Azure camera is used to provide the static view of the scene. (This describes the hardware for the real robot setup, but there is no specific mention of the computational hardware like GPU models, CPU types, or memory used for training the models.)
Software Dependencies	No	We apply dropout and use Adam W (Loshchilov & Hutter, 2017) with cosine learning rate decay (Loshchilov & Hutter, 2016) to optimize the network. (While optimizers and components like CLIP and MAE are mentioned, no specific version numbers for software libraries or frameworks (e.g., PyTorch, TensorFlow, Python, CUDA) are provided.)
Experiment Setup	Yes	Hyperparameters for pre-training and finetuning on CALVIN data are shown in Tab 3. Table 3: Training Hyperparameters - batch size 1024 512, learning rate 3.6e-4 1e-3, dropout 0.1 0.1, optimizer Adam W Adam W, learning rate schedule cosine decay cosine decay, warmup epochs 5 1, training epochs 50 20.