Object Scene Representation Transformer
Authors: Mehdi S. M. Sajjadi, Daniel Duckworth, Aravindh Mahendran, Sjoerd van Steenkiste, Filip Pavetic, Mario Lucic, Leonidas J. Guibas, Klaus Greff, Thomas Kipf
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To investigate OSRT s capabilities, we evaluate it on a range of datasets. After confirming that the proposed method outperforms existing methods on their comparably simple datasets, we move on to a more realistic, highly challenging dataset for all further investigations. We evaluate models by their novel view reconstruction quality and unsupervised scene decomposition capabilities qualitatively and quantitatively. We further investigate OSRT s computational requirements compared to the baselines and close this section with some further analysis into which ingredients are crucial to enable OSRT s unsupervised scene decomposition qualities in challenging settings. |
| Researcher Affiliation | Industry | Google Research |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main ex- perimental results (either in the supplemental material or as a URL)? [No] The new MSN-Hard dataset is published on the website. |
| Open Datasets | Yes | CLEVR-3D [33]. This is a recently proposed multicamera variant of the CLEVR dataset, which is popular for evaluating object decomposition due to its simple structure and unambiguous objects. |
| Dataset Splits | No | The paper specifies training and test set sizes (e.g., '35k training and 100 test scenes') but does not explicitly mention a separate validation set or how data was split for validation. |
| Hardware Specification | Yes | OSRT renders novel views at 32.5 fps (frames per second), more than 3000 faster than Ob Su RF which only achieves 0.01 fps, both measured on an Nvidia V100 GPU. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | No | The paper mentions the use of L2 reconstruction loss and the number of input views (e.g., '5 input views', '3 input views'), and details about slot initialization ('7 randomly initialized slots', '11 slots using a learned initialization'), but it does not specify concrete hyperparameter values such as learning rate, batch size, or optimizer settings within the provided text. |