SceneScape: Text-Driven Consistent Scene Generation
Authors: Rafail Fridman, Amit Abecasis, Yoni Kasten, Tali Dekel
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We thoroughly evaluate and ablate our method, demonstrating a significant improvement in quality and 3D consistency over existing methods. |
| Researcher Affiliation | Collaboration | Rafail Fridman Weizmann Institute of Science rafail.fridman@weizmann.ac.il Amit Abecasis Weizmann Institute of Science amit.abecasis@weizmann.ac.il Yoni Kasten NVIDIA Research ykasten@nvidia.com Tali Dekel Weizmann Institute of Science tali.dekel@weizmann.ac.il |
| Pseudocode | No | The paper describes its method through text and diagrams but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a project page link (https://scenescape.github.io/) but not a direct link to a source-code repository for the methodology. |
| Open Datasets | Yes | To compare to GEN-1 we used the Real Estate10K dataset [66], consisting of curated Internet videos and corresponding camera poses. |
| Dataset Splits | No | The paper describes training and testing procedures but does not explicitly provide details about a dedicated validation dataset split, its size, or its percentage. |
| Hardware Specification | Yes | Synthesizing 50 frame-long videos with our full method takes approximately 2.5 hours on an NVIDIA Tesla V100 GPU. |
| Software Dependencies | No | The paper mentions several models and tools (Stable Diffusion, DDIM scheduler, Mi Da S-DPT Large, PyTorch3D) but does not provide specific version numbers for these software dependencies or other libraries. |
| Experiment Setup | Yes | For each generated frame, we finetune it for 300 epochs, using Adam optimizer [25] with a learning rate of 1e 7. Additionally, we revert the weights of the depth prediction model to the initial state, as discussed in Sec. 3.3. We finetune the LDM decoder for 100 epochs on each generation step using Adam optimizer with a learning rate of 1e 4. |