Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
RoboCLIP: One Demonstration is Enough to Learn Robot Policies
Authors: Sumedh Sontakke, Jesse Zhang, Séb Arnold, Karl Pertsch, Erdem Bıyık, Dorsa Sadigh, Chelsea Finn, Laurent Itti
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Robo CLIP on the Metaworld Environment suite [Yu et al., 2020] and on the Franka Kitchen Environment [Gupta et al., 2019], and find that policies obtained by pretraining on the Robo CLIP reward result in 2 3 higher zero-shot task success in comparison to state-of-the-art imitation learning baselines. and 4 Experiments We test out each of the hypotheses defined in Section 1 on simulated robotic environments. |
| Researcher Affiliation | Collaboration | 1Thomas Lord Department of Computer Science, University of Southern California 2University of California, Berkeley 3Stanford University 4Google Research |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper mentions 'Visit our website for experiment videos.' but does not state that the source code for the methodology is available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We evaluate Robo CLIP on the Metaworld Environment suite [Yu et al., 2020] and on the Franka Kitchen Environment [Gupta et al., 2019] and The backbone model used in Robo CLIP is S3D [Xie et al., 2018] trained on the Howto100M dataset [Miech et al., 2019]. |
| Dataset Splits | No | The paper describes pretraining and finetuning phases, and mentions zero-shot evaluation, but does not specify explicit train, validation, or test dataset splits with percentages or sample counts for its experiments. |
| Hardware Specification | No | The paper does not specify the hardware (e.g., GPU, CPU models, or cloud computing instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions using PPO [Schulman et al., 2017] but does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | No | The paper states that agents are trained with PPO but does not provide specific experimental setup details such as hyperparameter values, learning rates, batch sizes, or network configurations. |