reproducibilityindex.ai

General Articulated Objects Manipulation in Real Images via Part-Aware Diffusion Process

Authors: ZHOU FANG, Yong-Lu Li, Lixin Yang, Cewu Lu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are provided to illustrate the advanced manipulation capabilities of our method concerning state-of-the-art editing works. Additionally, we verify our method on 3D articulated object understanding for embodied robot scenarios and the promising results prove that our method supports this task strongly.
Researcher Affiliation	Academia	Zhou Fang Yong-Lu Li Lixin Yang Cewu Lu Shanghai Jiao Tong University {joefang, yonglu_li, siriusyang, lucewu}@sjtu.edu.cn
Pseudocode	Yes	A Additional algorithm pipeline of the PA-Diffusion model To facilitate the understanding of our proposed PA-Diffusion model, we present the entire algorithm pipeline in Algorithm 1.
Open Source Code	Yes	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: The code will be published.
Open Datasets	Yes	Second, we create a synthetic training set to support the challenging 3D articulated object understanding task in the robotic scenarios. To release the data limitation, we create a synthetic dataset with the PA-Diffusion model. The dataset includes 660 sequential samples
Dataset Splits	No	First, the generated sequential samples are divided into training/testing sets (612/48). The fine-tuned model is evaluated on the testing set (6,231 real images) of the Internet Video Dataset. The text only explicitly mentions training and testing sets, without a separate validation split.
Hardware Specification	Yes	All experiments run on a single NVIDIA A100 GPU.
Software Dependencies	Yes	The fundamental diffusion model is Stable Diffusion V1-5.
Experiment Setup	Yes	In this work, we select Grounded Segment Anything [21, 29] to obtain the initial part-level object segmentation masks. T2I Adapter [33] is chosen as the conditional generation model, and the condition we used is the sketch map. The fundamental diffusion model is Stable Diffusion V1-5. All experiments run on a single NVIDIA A100 GPU. Notably, NO models need to be trained or fine-tuned in the image editing process. Primitive Prototype Library is built within Blender [10]. 3D planes, cubes, boxes, and other 3D primitive shapes are created and combined to represent different objects. In this work, 5 primitive shapes are collected and combined to represent 6 categories of articulated objects.