reproducibilityindex.ai

Unsupervised Multi-Object Segmentation by Predicting Probable Motion Patterns

Authors: Laurynas Karazija, Subhabrata Choudhury, Iro Laina, Christian Rupprecht, Andrea Vedaldi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the advantage of this approach over its deterministic counterpart and show state-of-the-art unsupervised object segmentation performance on simulated and real-world benchmarks, surpassing methods that use motion even at test time. As our approach is applicable to variety of network architectures that segment the scenes, we also apply it to existing image reconstruction-based models showing drastic improvement.
Researcher Affiliation	Academia	Visual Geometry Group University of Oxford Oxford, UK {laurynas,subha,iro,chrisr,vedaldi}@robots.ox.ac.uk
Pseudocode	No	The paper does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	Project page and code: https://www.robots.ox.ac.uk/~vgg/research/ppmp.
Open Datasets	Yes	Datasets. We evaluate our method on video and still image datasets. For video-based data, we use the Multi-Object Video (MOVi) datasets, released as part of Kubric [20]. Specifically, we employ MOVi-{A,C,D,E} versions. ... To evaluate our method on still images, we use CLEVR [25] and CLEVRTEX [27] benchmark suites. ... Since our method requires optical flow during training, we extend the implementation of [27] to generate video datasets of CLEVR and CLEVRTEX scenes ... We also evaluate our method on the real-world KITTI [16] benchmark which depicts street scenes captured from a moving car.
Dataset Splits	Yes	We generate 10k sequences for MOVINGCLEVRTEX and 5k for MOVINGCLEVR, where we retain 1000 and 500 sequences, respectively, for validation. Each sequence is 5 frames long. ... We follow the set up of [3], using 147 videos for training and evaluate on the instance segmentation subset which contains 200 annotated validation frames.
Hardware Specification	Yes	The model takes approximately 48h to train on a single A30 24GB GPU.2 All training details and hyper-parameters are included in the Appendix. 2Approx. total compute in this paper: 100 GPU days for our models, 154 GPU days for comparisons.
Software Dependencies	No	The paper mentions using Mask2Former [7], a CNN backbone, ResNet-18, Swin-tiny transformer [36], and RAFT [48] but does not provide specific version numbers for these software components or any other libraries/environments.
Experiment Setup	No	The paper states 'All training details and hyper-parameters are included in the Appendix,' indicating that these specific details are not present in the main text. Only general settings like network architectures are mentioned without specific values.