Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code
Authors: Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A Ross, Cordelia Schmid, Alireza Fathi
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation demonstrates that Scene Craft surpasses existing LLM-based agents in rendering complex scenes, as shown by its adherence to constraints and favorable human assessments. We conduct comprehensive experiments on both synthetic and real-world datasets. |
| Researcher Affiliation | Collaboration | 1California Institute of Technology 2Google Deep Mind. |
| Pseudocode | Yes | The pseudo-code of the whole dual-loop learning is illustrated in Alg 1. |
| Open Source Code | No | The paper does not provide a concrete statement or a specific link to the source code for the methodology described in this paper. |
| Open Datasets | Yes | For the Sintel movie dataset... Sintel Movie, which is an animated fantasy short film produced with Blender, where scripts and Blender scenes are open sourced4. We download all these scenes, using the first half as the training set and the remaining half for testing. 4https://studio.blender.org/films/sintel/ |
| Dataset Splits | No | The paper states 'using the first half as the training set and the remaining half for testing' but does not explicitly mention a separate validation split or its proportions. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like Blender, GPT-4V, Video Poet, and txtai but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | In our refinement algorithm, the average number of iterations is hard-coded as 4 steps without early stopping. The maximum number of subproblems of the current system is 7, so the maximum number of tokens for each scene design is around 15k, and the average is 6k. |