reproducibilityindex.ai

Zero-Shot Robotic Manipulation with Pre-Trained Image-Editing Diffusion Models

Authors: Kevin Black, Mitsuhiko Nakamoto, Pranav Atreya, Homer Rich Walke, Chelsea Finn, Aviral Kumar, Sergey Levine

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 EXPERIMENTAL EVALUATION
Researcher Affiliation	Collaboration	1University of California, Berkeley 2Stanford University 3Google Deep Mind
Pseudocode	Yes	Algorithm 1 Su SIE: Zero-Shot, Test-Time Execution
Open Source Code	No	The project website can be found at http://rail-berkeley.github.io/susie.
Open Datasets	Yes	Our dataset is Bridge Data V2 [59], a large and diverse dataset of robotic manipulation behaviors designed for evaluating open-vocabulary instructions. ... Our video-only dataset Dl is the Something-Something dataset [19], a dataset consisting of short video clips of humans manipulating various objects.
Dataset Splits	No	No explicit percentages or absolute sample counts for training, validation, and test splits were provided within the paper for general datasets, although environmental splits were noted for CALVIN.
Hardware Specification	Yes	We train for 40k steps with a batch size of 1024 on a single v4-64 TPU pod, which takes 17 hours. ... We train with a batch size of 256 for 445k steps on a single v4-8 TPU VM, which takes 15 hours.
Software Dependencies	No	The paper mentions software components like Instruct Pix2Pix, OWL-ViT, Flan-T5-Base, CLIP, MUSE, and DDIM sampler, but does not provide specific version numbers for these or other key software dependencies.
Experiment Setup	Yes	We finetune Instruct Pix2Pix [9] using similar hyperparameters to the initial Instruct Pix2Pix training. We use the Adam W optimizer [40] with a learning rate of 1e-4, a linear warmup of 800 steps, and weight decay of 0.01. ... At test time, we use an image guidance weight of 2.5 and a text guidance weight of 7.5. We use the DDIM sampler [56] with 50 sampling steps. ... We use the Adam optimizer [36] with a learning rate of 3e-4 and a linear warmup of 2000 steps. We train with a batch size of 256 for 445k steps ... We augment the observation and goal with random crops, random resizing, and color jitter.