General Articulated Objects Manipulation in Real Images via Part-Aware Diffusion Process

Authors: ZHOU FANG, Yong-Lu Li, Lixin Yang, Cewu Lu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are provided to illustrate the advanced manipulation capabilities of our method concerning state-of-the-art editing works. Additionally, we verify our method on 3D articulated object understanding for embodied robot scenarios and the promising results prove that our method supports this task strongly.
Researcher Affiliation Academia Zhou Fang Yong-Lu Li Lixin Yang Cewu Lu Shanghai Jiao Tong University {joefang, yonglu_li, siriusyang, lucewu}@sjtu.edu.cn
Pseudocode Yes A Additional algorithm pipeline of the PA-Diffusion model To facilitate the understanding of our proposed PA-Diffusion model, we present the entire algorithm pipeline in Algorithm 1.
Open Source Code Yes Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The code will be published.
Open Datasets Yes Second, we create a synthetic training set to support the challenging 3D articulated object understanding task in the robotic scenarios. To release the data limitation, we create a synthetic dataset with the PA-Diffusion model. The dataset includes 660 sequential samples
Dataset Splits No First, the generated sequential samples are divided into training/testing sets (612/48). The fine-tuned model is evaluated on the testing set (6,231 real images) of the Internet Video Dataset. The text only explicitly mentions training and testing sets, without a separate validation split.
Hardware Specification Yes All experiments run on a single NVIDIA A100 GPU.
Software Dependencies Yes The fundamental diffusion model is Stable Diffusion V1-5.
Experiment Setup Yes In this work, we select Grounded Segment Anything [21, 29] to obtain the initial part-level object segmentation masks. T2I Adapter [33] is chosen as the conditional generation model, and the condition we used is the sketch map. The fundamental diffusion model is Stable Diffusion V1-5. All experiments run on a single NVIDIA A100 GPU. Notably, NO models need to be trained or fine-tuned in the image editing process. Primitive Prototype Library is built within Blender [10]. 3D planes, cubes, boxes, and other 3D primitive shapes are created and combined to represent different objects. In this work, 5 primitive shapes are collected and combined to represent 6 categories of articulated objects.