SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos
Authors: Gamaleldin Elsayed, Aravindh Mahendran, Sjoerd van Steenkiste, Klaus Greff, Michael C. Mozer, Thomas Kipf
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The goal of our experimental evaluation is twofold: 1) on synthetic video data of varying complexity we would like to analyze the potential advantages of utilizing a depth signal and model scaling strategies for learning emergent segmentation and tracking, and 2) we would like to investigate whether these improvements enable bridging the gap to complex real-world video data. Section 4.1 covers both qualitative and quantitative comparisons of SAVi++ against baselines on the synthetic MOVi datasets. In Section 4.2, we perform an ablation study on SAVi++. Finally, in Section 4.3 we demonstrate and analyze results for a SAVi++ model applied to real-world driving videos from the Waymo Open [47] dataset. |
| Researcher Affiliation | Industry | Gamaleldin F. Elsayed , Aravindh Mahendran , Sjoerd van Steenkiste , Klaus Greff, Michael C. Mozer & Thomas Kipf Google Research |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Project page: https://slot-attention-video.github.io/savi++/ For our code release, see our project website at https://slot-attention-video.github.io/savi++/. |
| Open Datasets | Yes | We use three synthetic Multi-Object Video (MOVi) datasets (Figure 3a) introduced in Kubric [16], which are created by simulating rigid body dynamics. ... We also train and evaluate SAVi++ in a real-world driving setting using the Waymo Open dataset (Figure 3b). Waymo Open is comprised of high resolution video data of 1280 1920 original resolution from a multi-camera system collected by Waymo vehicles [47]. |
| Dataset Splits | Yes | The dataset consists of 798 train and 202 validation scenes of 20s video each, sampled at 10 fps. |
| Hardware Specification | Yes | We train SAVi++ for 500k steps on Tensor Processing Unit (TPU) accelerators with a batch size of 64 using Adam [29]. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4). |
| Experiment Setup | Yes | We train SAVi++ for 500k steps on Tensor Processing Unit (TPU) accelerators with a batch size of 64 using Adam [29]. We train on randomly sampled sub-sequences of only 6 frames using 24 slots for MOVi and 11 slots for Waymo Open. |