SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos

Authors: Gamaleldin Elsayed, Aravindh Mahendran, Sjoerd van Steenkiste, Klaus Greff, Michael C. Mozer, Thomas Kipf

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The goal of our experimental evaluation is twofold: 1) on synthetic video data of varying complexity we would like to analyze the potential advantages of utilizing a depth signal and model scaling strategies for learning emergent segmentation and tracking, and 2) we would like to investigate whether these improvements enable bridging the gap to complex real-world video data. Section 4.1 covers both qualitative and quantitative comparisons of SAVi++ against baselines on the synthetic MOVi datasets. In Section 4.2, we perform an ablation study on SAVi++. Finally, in Section 4.3 we demonstrate and analyze results for a SAVi++ model applied to real-world driving videos from the Waymo Open [47] dataset.
Researcher Affiliation Industry Gamaleldin F. Elsayed , Aravindh Mahendran , Sjoerd van Steenkiste , Klaus Greff, Michael C. Mozer & Thomas Kipf Google Research
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Project page: https://slot-attention-video.github.io/savi++/ For our code release, see our project website at https://slot-attention-video.github.io/savi++/.
Open Datasets Yes We use three synthetic Multi-Object Video (MOVi) datasets (Figure 3a) introduced in Kubric [16], which are created by simulating rigid body dynamics. ... We also train and evaluate SAVi++ in a real-world driving setting using the Waymo Open dataset (Figure 3b). Waymo Open is comprised of high resolution video data of 1280 1920 original resolution from a multi-camera system collected by Waymo vehicles [47].
Dataset Splits Yes The dataset consists of 798 train and 202 validation scenes of 20s video each, sampled at 10 fps.
Hardware Specification Yes We train SAVi++ for 500k steps on Tensor Processing Unit (TPU) accelerators with a batch size of 64 using Adam [29].
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4).
Experiment Setup Yes We train SAVi++ for 500k steps on Tensor Processing Unit (TPU) accelerators with a batch size of 64 using Adam [29]. We train on randomly sampled sub-sequences of only 6 frames using 24 slots for MOVi and 11 slots for Waymo Open.