DORSal: Diffusion for Object-centric Representations of Scenes $\textit{et al.}$
Authors: Allan Jabri, Sjoerd van Steenkiste, Emiel Hoogeboom, Mehdi S. M. Sajjadi, Thomas Kipf
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate DORSal on challenging synthetic and real-world scenes in three settings: 1) we compare the ability to synthesize novel views of a scene with related approaches, 2) we analyze the capability for simple scene edits: object removal and object transfer between scenes, and 3) we investigate the ability of DORSal to render smooth, view-consistent camera paths. We provide detailed ablations in Appendix C.1. |
| Researcher Affiliation | Collaboration | Allan Jabri , UC Berkeley Sjoerd van Steenkiste Google Research Emiel Hoogeboom Google Deep Mind Mehdi S. M. Sajjadi Google Deep Mind Thomas Kipf Google Deep Mind |
| Pseudocode | No | The paper describes its methods and procedures in narrative text and uses diagrams (e.g., Figure 1, 2) to illustrate architectures, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: 'We would like to thank Daniel Watson for making the 3Di M codebase readily available for comparison, and help with debugging and onboarding new datasets.' This refers to the codebase of a baseline method (3Di M), not the authors' own code for DORSal. There is no explicit statement or link indicating that the source code for DORSal is being released. |
| Open Datasets | Yes | Multi Shape Net (MSN) (Sajjadi et al., 2022c)... Street View (SV) dataset... Street View imagery and permission for publication have been obtained from the authors (Google, 2007). |
| Dataset Splits | No | The paper mentions training models and evaluating them on a 'test set' (e.g., 'evaluate performance at novel-view synthesis on a test set of 1000 scenes'), but it does not explicitly provide details about a distinct 'validation' dataset split or its size/proportion. |
| Hardware Specification | Yes | We train DORSal on 8 TPU v4 (Jouppi et al., 2023) chips using a batch size of 8 for approx. one week to reach 1M steps. |
| Software Dependencies | No | The paper mentions software components like 'Adam' optimizer, but does not provide specific version numbers for any libraries, frameworks, or other ancillary software dependencies required for replication. |
| Experiment Setup | Yes | We train with a global batch size of 8 and classifier-free guidance with a conditioning dropout probability of 0.1 (and an inference guidance weight of 2). We report results after training for 1 000 000 steps... We use a median kernel size of 7 for all edit evaluations (incl. the baselines). |