reproducibilityindex.ai

LIV: Language-Image Representations and Rewards for Robotic Control

Authors: Yecheng Jason Ma, Vikash Kumar, Amy Zhang, Osbert Bastani, Dinesh Jayaraman

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive experimental evaluations on several simulated and real-world household robotic manipulation settings. Our experiments evaluate LIV vision-language representations not only in their capacity as input state representations for language-conditioned behavior cloning of task policies, but also to directly ground language-based task specifications into visual state-based rewards for robot trajectory optimization. In many cases, the pre-trained LIV model, without ever seeing robots in its pre-training human video dataset, can zero-shot produce dense language-conditioned reward on unseen robot videos.
Researcher Affiliation	Collaboration	1University of Pennsylvania 2Meta AI.
Pseudocode	Yes	Pseudocode is presented in Algorithm 1.
Open Source Code	Yes	LIV model and training code are released: github.com/penn-pal-lab/LIV
Open Datasets	Yes	We pre-train LIV on Epic Kitchen (Damen et al., 2018), a text-annotated ego-centric video dataset of humans completing tasks in diverse household kitchens; this dataset consists of 90k video segments, totalling 20M frames and 20k unique text annotations, and offers diverse camera views and action-centric videos, making it an ideal choice for vision-language pre-training.
Dataset Splits	No	The paper mentions evaluating on 'test split' of Epic Kitchen and using 'best training checkpoints' which implies validation, but does not provide specific details on how validation sets were created (e.g., percentages, sample counts, or explicit splitting methodology) for their experiments across all datasets used.
Hardware Specification	Yes	The pre-training takes place on a node of 8 NVIDIA V100 GPUs.
Software Dependencies	No	The paper mentions using 'CLIP architecture', 'Res Net50', 'CLIP Transformer', and 'Adam' optimizer, but it does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or the CLIP implementation itself.
Experiment Setup	Yes	Table 2. VIP Architecture & Pre-Training Hyperparameters. Table 4. LCBC Hyperparameters.