reproducibilityindex.ai

Object Scene Representation Transformer

Authors: Mehdi S. M. Sajjadi, Daniel Duckworth, Aravindh Mahendran, Sjoerd van Steenkiste, Filip Pavetic, Mario Lucic, Leonidas J. Guibas, Klaus Greff, Thomas Kipf

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To investigate OSRT s capabilities, we evaluate it on a range of datasets. After conﬁrming that the proposed method outperforms existing methods on their comparably simple datasets, we move on to a more realistic, highly challenging dataset for all further investigations. We evaluate models by their novel view reconstruction quality and unsupervised scene decomposition capabilities qualitatively and quantitatively. We further investigate OSRT s computational requirements compared to the baselines and close this section with some further analysis into which ingredients are crucial to enable OSRT s unsupervised scene decomposition qualities in challenging settings.
Researcher Affiliation	Industry	Google Research
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	Did you include the code, data, and instructions needed to reproduce the main ex- perimental results (either in the supplemental material or as a URL)? [No] The new MSN-Hard dataset is published on the website.
Open Datasets	Yes	CLEVR-3D [33]. This is a recently proposed multicamera variant of the CLEVR dataset, which is popular for evaluating object decomposition due to its simple structure and unambiguous objects.
Dataset Splits	No	The paper specifies training and test set sizes (e.g., '35k training and 100 test scenes') but does not explicitly mention a separate validation set or how data was split for validation.
Hardware Specification	Yes	OSRT renders novel views at 32.5 fps (frames per second), more than 3000 faster than Ob Su RF which only achieves 0.01 fps, both measured on an Nvidia V100 GPU.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library names with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	No	The paper mentions the use of L2 reconstruction loss and the number of input views (e.g., '5 input views', '3 input views'), and details about slot initialization ('7 randomly initialized slots', '11 slots using a learned initialization'), but it does not specify concrete hyperparameter values such as learning rate, batch size, or optimizer settings within the provided text.