reproducibilityindex.ai

MIDGArD: Modular Interpretable Diffusion over Graphs for Articulated Designs

Authors: Quentin Leboutet, Nina Wiedemann, zhipeng cai, Michael Paulitsch, Kai Yuan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show the superiority of MIDGAr D on the quality, consistency, and interpretability of the generated assets. Importantly, the generated models are fully simulatable, i.e., can be seamlessly integrated into standard physics engines such as Mu Jo Co, broadening MIDGAr D’s applicability to fields such as digital content creation, meta realities, and robotics.
Researcher Affiliation	Industry	Quentin Leboutet Nina Wiedemann Zhipeng Cai Michael Paulitsch Kai Yuan Intel Labs XRL e Xtended Reality Laboratory {firstname.lastname}@intel.com
Pseudocode	No	The paper includes figures illustrating pipelines, but no structured pseudocode or algorithm blocks.
Open Source Code	No	Code and models are available at https://quentin-leboutet.github.io/MIDGAr D. Opensource code will be provided upon acceptance.
Open Datasets	Yes	Dataset All experiments were conducted using the Part Net Mobility dataset [97], which contains a diverse set of articulated 3D objects with detailed geometric and kinematic annotations.
Dataset Splits	No	The paper mentions a 'train-test split' but does not explicitly provide details for a separate validation split, nor its percentages or counts.
Hardware Specification	Yes	We trained the structure generator and the image VQ-VAE on an NVIDIA RTX 3090 GPU, while the shape generator was trained on an NVIDIA RTX 6000 GPU. Evaluation took place on a single NVIDIA RTX 3090 GPU.
Software Dependencies	No	The paper mentions various software components (e.g., SDFusion, BERT, ResNet-18, VQ-VAE, MuJoCo) and a general programming language (Python) but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	The denoising model used in the structure generator contains six graph attention blocks, with a latent embedding size of 512 and 32 attention heads. We set the maximum number of nodes in the graph to N = 8. Our training parameters closely follow those in NAP, with the key difference being the use of an implicit denoising diffusion pipeline [80] over 100 time steps, as opposed to a DDPM with 1,000 time steps. Our shape generator is adapted from SDFusion [8] and trained on the Part Net Mobility dataset. We used the same hyperparameters as the multimodal model in SDFusion and utilized their pre-trained VQ-VAE checkpoint. We excluded 10 categories from training due to their objects containing numerous equally-shaped parts (e.g., keyboards with over 30 keys). For encoding the part dimensions, we use an MLP with three hidden layers (size 16, 64 and 256 respectively).