Learning What and Where: Disentangling Location and Identity Tracking Without Supervision
Authors: Manuel Traub, Sebastian Otte, Tobias Menge, Matthias Karlbauer, Jannik Thuemmel, Martin V. Butz
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As a main result, we observe superior performance on the CATER benchmark: Loci outperforms previous methods by a large margin with an order of magnitude fewer parameters. Additional evaluations on moving MNIST, an aquarium video footage, and on the CLEVRER benchmark underline Loci s contribution towards the self-organized, disentangled identification and localization of objects as well as the effective processing of object interaction dynamics from video data. |
| Researcher Affiliation | Academia | Manuel Traub Neuro-Cognitive Modeling University of Tübingen Sebastian Otte Neuro-Cognitive Modeling University of Tübingen Tobias Menge Neuro-Cognitive Modeling University of Tübingen Matthias Karlbauer Neuro-Cognitive Modeling University of Tübingen Jannik Thümmel ML in Climate Science University of Tübingen Martin V. Butz Neuro-Cognitive Modeling University of Tübingen |
| Pseudocode | Yes | A detailed algorithmic description is provided in Appendix B. ... Algorithm 1 Loci-Algorithm (main processing loop) ... Algorithm 2 Priority-based-Attention |
| Open Source Code | Yes | 1Source Code: https://github.com/CognitiveModeling/Loci |
| Open Datasets | Yes | We evaluate Loci mainly on the CATER challenge [32]. ... The moving MNIST (MMNIST) challenge is a dataset for video prediction [77]. ... Additionally, we examine Gestalt preservation and indicators of intuitive physics in closed loop predictions on the CLEVRER dataset [93]. |
| Dataset Splits | Yes | As described in Girdhar & Ramanan [32], we split the dataset with a ratio of 70:30 into a training and test set and further put aside 20% of the training set as a validation set, leaving 56% of the original data for training. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions "Open CV [9]" but does not specify a version number. It also mentions "rectified Adam optimization [RAdam, cf. 56]" but this is an optimization algorithm, not a specific software dependency with a version. |
| Experiment Setup | Yes | Loci is trained using a binary cross-entropy loss (LBCE) pixel-wise on the frame prediction applying rectified Adam optimization [RAdam, cf. 56]. Several regularizing losses are added to foster object permanence. Additionally, to speed-up learning, we use truncated backpropagation trough time and enhance the gradients with an e-prop-like accumulation of previous neural activities [7]. ... Another important aspect for successful training is the use of a warmup phase, where we mask the target of the network with a threshold τ: ... After about 30 000 updates when the network has sufficiently learned to use the position encodings we switch from the masked foreground reconstruction to the full reconstruction. ... To blend-in the supervision loss, we first set the supervision factor µs = 0.01 for the first 4 epochs, fostering mostly unsupervised training, and then set µs = 0.1 for the duration of the training process and to µs = 0.3 during the last epochs, to give the Snitch location a weak pull towards the target location. |