reproducibilityindex.ai

Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators

Authors: Daniel Geng, Andrew Owens

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method s ability to manipulate image structure, both qualitatively and quantitatively, on real and generated images. Additional results can be found in Appendix A3.
Researcher Affiliation	Academia	Daniel Geng, Andrew Owens University of Michigan
Pseudocode	No	The paper describes its method in text and through equations but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	https://dangeng.github.io/motion_guidance
Open Datasets	Yes	We evaluate on two different datasets. The first dataset is composed of examples with handcrafted target flows, a subset of which can be seen in Figures 1 , 2 , 3 , 4 , and 7 . This dataset has the advantage of containing interesting motions that are of practical interest. In addition, we can write highly specific instructions for the Instruct Pix2Pix baseline for a fair comparison. However, this dataset is curated to an extent. We ameliorate this by performing an additional evaluation on an automatically generated dataset based on KITTI (Geiger et al., 2012), which contains egocentric driving videos with labeled bounding boxes on cars.
Dataset Splits	No	The paper describes the datasets used (KITTI, curated dataset) but does not provide specific train/validation/test split percentages or sample counts.
Hardware Specification	Yes	All experiments are conducted on a single NVIDIA A40 GPU.
Software Dependencies	Yes	We use RAFT (Teed & Deng, 2020) as our flow model. ... For our experiments we use Stable Diffusion (Rombach et al., 2021). Rather than performing diffusion directly on pixels, Stable Diffusion performs diffusion in a latent space, with an encoder and decoder to convert between pixel and latent space. To accommodate this, we precompose the decoder with the motion guidance function, L(D( )), so that the guidance function can accept latent codes. Additionally, we downsample our edit mask to 64 64, the spatial size of the Stable Diffusion latent space. ... We use Stable Diffusion v1.4 with a DDIM sampler for 500 steps, and we generate images at a resolution of 512 512.
Experiment Setup	Yes	We use Stable Diffusion v1.4 with a DDIM sampler for 500 steps, and we generate images at a resolution of 512 512. All experiments are conducted on a single NVIDIA A40 GPU. For our motion guidance function (Eq. 4) we found that setting λcolor to 100 and λflow to 3 worked well. In addition, in our implementation we scale the guidance gradients by a global weight of 300. We set the gradient clipping threshold cg to be 200 and take K = 10 recursive denoising steps.