Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ShapeCraft: LLM Agents for Structured, Textured and Interactive 3D Modeling
Authors: Shuyuan Zhang, ChenHan Jiang, Zuoou Li, Jiankang Deng
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Qualitative and quantitative experiments demonstrate Shape Craft s superior performance in generating geometrically accurate and semantically rich 3D assets compared to existing LLM-based agents. We further show the versatility of Shape Craft through examples of animated and user-customized editing, highlighting its potential for broader interactive applications. |
| Researcher Affiliation | Academia | 1Imperial College London 2Hong Kong University of Science and Technology |
| Pseudocode | Yes | Algorithm 1: Iterative Shape Modeling with Multi-path Sampling |
| Open Source Code | No | Answer: [No] Justification: will be released after acceptance. |
| Open Datasets | Yes | We benchmark on 26 long-form functional prompts from MARVEL-40M+ [52], itself derived from Objaverse [12]. |
| Dataset Splits | Yes | All evaluations are performed on the exported meshes. We benchmark on 26 long-form functional prompts from MARVEL-40M+ [52], itself derived from Objaverse [12]. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or cloud providers used for running experiments. |
| Software Dependencies | Yes | We employ the same Qwen3-235B-A22B with thinking disabled as Parser and Coder agents... And Qwen-VL-Max as the Evaluator agent. |
| Experiment Setup | Yes | For shape modeling, we set the number of path M = 3 and the iterative update step T = 3 for each node. More experiment settings can be found in Appendix Section B. ... we set a uniform sampling temperature of 0.5 across all LLM and VLM queries, allowing up to three retries in terms of network failure; the visual evaluation score is ranged from 0 to 10 and an early-stopping threshold of 9 is applied; we allow up to one update of the GPS representation G during representation bootstrapping, effectively setting N = 1. |