Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Zero-Shot Robotic Manipulation with Pre-Trained Image-Editing Diffusion Models
Authors: Kevin Black, Mitsuhiko Nakamoto, Pranav Atreya, Homer Rich Walke, Chelsea Finn, Aviral Kumar, Sergey Levine
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTAL EVALUATION |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley 2Stanford University 3Google Deep Mind |
| Pseudocode | Yes | Algorithm 1 Su SIE: Zero-Shot, Test-Time Execution |
| Open Source Code | No | The project website can be found at http://rail-berkeley.github.io/susie. |
| Open Datasets | Yes | Our dataset is Bridge Data V2 [59], a large and diverse dataset of robotic manipulation behaviors designed for evaluating open-vocabulary instructions. ... Our video-only dataset Dl is the Something-Something dataset [19], a dataset consisting of short video clips of humans manipulating various objects. |
| Dataset Splits | No | No explicit percentages or absolute sample counts for training, validation, and test splits were provided within the paper for general datasets, although environmental splits were noted for CALVIN. |
| Hardware Specification | Yes | We train for 40k steps with a batch size of 1024 on a single v4-64 TPU pod, which takes 17 hours. ... We train with a batch size of 256 for 445k steps on a single v4-8 TPU VM, which takes 15 hours. |
| Software Dependencies | No | The paper mentions software components like Instruct Pix2Pix, OWL-ViT, Flan-T5-Base, CLIP, MUSE, and DDIM sampler, but does not provide specific version numbers for these or other key software dependencies. |
| Experiment Setup | Yes | We finetune Instruct Pix2Pix [9] using similar hyperparameters to the initial Instruct Pix2Pix training. We use the Adam W optimizer [40] with a learning rate of 1e-4, a linear warmup of 800 steps, and weight decay of 0.01. ... At test time, we use an image guidance weight of 2.5 and a text guidance weight of 7.5. We use the DDIM sampler [56] with 50 sampling steps. ... We use the Adam optimizer [36] with a learning rate of 3e-4 and a linear warmup of 2000 steps. We train with a batch size of 256 for 445k steps ... We augment the observation and goal with random crops, random resizing, and color jitter. |