General Articulated Objects Manipulation in Real Images via Part-Aware Diffusion Process
Authors: ZHOU FANG, Yong-Lu Li, Lixin Yang, Cewu Lu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are provided to illustrate the advanced manipulation capabilities of our method concerning state-of-the-art editing works. Additionally, we verify our method on 3D articulated object understanding for embodied robot scenarios and the promising results prove that our method supports this task strongly. |
| Researcher Affiliation | Academia | Zhou Fang Yong-Lu Li Lixin Yang Cewu Lu Shanghai Jiao Tong University {joefang, yonglu_li, siriusyang, lucewu}@sjtu.edu.cn |
| Pseudocode | Yes | A Additional algorithm pipeline of the PA-Diffusion model To facilitate the understanding of our proposed PA-Diffusion model, we present the entire algorithm pipeline in Algorithm 1. |
| Open Source Code | Yes | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The code will be published. |
| Open Datasets | Yes | Second, we create a synthetic training set to support the challenging 3D articulated object understanding task in the robotic scenarios. To release the data limitation, we create a synthetic dataset with the PA-Diffusion model. The dataset includes 660 sequential samples |
| Dataset Splits | No | First, the generated sequential samples are divided into training/testing sets (612/48). The fine-tuned model is evaluated on the testing set (6,231 real images) of the Internet Video Dataset. The text only explicitly mentions training and testing sets, without a separate validation split. |
| Hardware Specification | Yes | All experiments run on a single NVIDIA A100 GPU. |
| Software Dependencies | Yes | The fundamental diffusion model is Stable Diffusion V1-5. |
| Experiment Setup | Yes | In this work, we select Grounded Segment Anything [21, 29] to obtain the initial part-level object segmentation masks. T2I Adapter [33] is chosen as the conditional generation model, and the condition we used is the sketch map. The fundamental diffusion model is Stable Diffusion V1-5. All experiments run on a single NVIDIA A100 GPU. Notably, NO models need to be trained or fine-tuned in the image editing process. Primitive Prototype Library is built within Blender [10]. 3D planes, cubes, boxes, and other 3D primitive shapes are created and combined to represent different objects. In this work, 5 primitive shapes are collected and combined to represent 6 categories of articulated objects. |