Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CADMorph: Geometry‑Driven Parametric CAD Editing via a Plan–Generate–Verify Loop

Authors: Weijian Ma, Shizhao Sun, Ruiyu Wang, Jiang Bian

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that CADMorph outperforms state-of-the-art general-purpose models (GPT-4o [Open AI, 2024]) and powerful CAD-specific baselines [Ma et al., 2024, Zhang et al., 2024] both quantitatively and qualitatively. In addition, we showcase two downstream applications iterative geometry editing and reverse-engineering refinement highlighting CADMorph s versatility in real-world design workflows. Our key contributions are summarized as follows:... and Table 1: Quantitative results. Io U, mean CD, and median CD quantify how closely the shape rendered from the edited sequence matches the target geometry. Edit Dist. measures how much the edited sequence diverges from the original sequence. IR is the percentage of edited sequences that cannot be rendered into a valid shape, and JSD is the distributional gap between the generated and target shapes. Human Eval. reports the average rank assigned by human annotators.
Researcher Affiliation	Collaboration	Weijian Ma Fudan University EMAIL Shizhao Sun Microsoft Research, Asia EMAIL Ruiyu Wang University of Toronto EMAIL Jiang Bian Microsoft Research, Asia EMAIL
Pseudocode	No	Figure 2: General pipeline of CADMorph. (a) Iterative editing loop. For each round r R: (i) The planning stage selects the editing location via peeking the cross-attention map of the P2S model. (ii) The generation stage propose candidate sequences via the MPP model (a finetuned LLM). (iii) The verification stage selects the candidate sequence best matches the target shape for round r. The parameters of each primitive in C are omitted for brevity. (b) Architecture of P2S and MPP model.
Open Source Code	No	Answer: [No] Justification: The code will be released upon acceptance and passed the code review.
Open Datasets	Yes	Datasets. We train both the P2S and MPP model on the Deep CAD corpus [Wu et al., 2021], which contains about 130k CAD models after removing non-renderable shapes. We keep the official train/validation/test splits. Parametric construction sequences follow the format of Ma et al. [2024]; voxelised SDFs are obtained with Python OCC [Paviot and Contributors, 2025], Trimesh [Dawson Haggerty and Contributors, 2025], and meshtosdf [Kleineberg, 2025]. For evaluation, we adopt the 2k test set from CAD-Editor [Yuan et al., 2025].
Dataset Splits	Yes	We train both the P2S and MPP model on the Deep CAD corpus [Wu et al., 2021], which contains about 130k CAD models after removing non-renderable shapes. We keep the official train/validation/test splits. Parametric construction sequences follow the format of Ma et al. [2024]; voxelised SDFs are obtained with Python OCC [Paviot and Contributors, 2025], Trimesh [Dawson Haggerty and Contributors, 2025], and meshtosdf [Kleineberg, 2025]. For evaluation, we adopt the 2k test set from CAD-Editor [Yuan et al., 2025].
Hardware Specification	Yes	The MPP model is finetuned from Llama3-8b-Instruct [Meta AI, 2024] with a Lo RA [Hu et al., 2022] rank of 32 under the batch of 16 for 60 epochs on 8 A100-40GB-SXM GPUs. The initial learning rate is set to 5e 4 with maximal token length of 1024. The P2S model is trained on the same hardware with a total batch size of 8 and an initial learning rate of 5e 5 for 600k steps.
Software Dependencies	No	The MPP model is finetuned from Llama3-8b-Instruct [Meta AI, 2024] with a Lo RA [Hu et al., 2022] rank of 32 under the batch of 16 for 60 epochs on 8 A100-40GB-SXM GPUs. ... Parametric construction sequences follow the format of Ma et al. [2024]; voxelised SDFs are obtained with Python OCC [Paviot and Contributors, 2025], Trimesh [Dawson Haggerty and Contributors, 2025], and meshtosdf [Kleineberg, 2025].
Experiment Setup	Yes	Implementation Details. The MPP model is finetuned from Llama3-8b-Instruct [Meta AI, 2024] with a Lo RA [Hu et al., 2022] rank of 32 under the batch of 16 for 60 epochs on 8 A100-40GB-SXM GPUs. The initial learning rate is set to 5e 4 with maximal token length of 1024. The P2S model is trained on the same hardware with a total batch size of 8 and an initial learning rate of 5e 5 for 600k steps. The maximum iteration of the plan-generation-verify framework is set as 10.