Learning What and Where: Disentangling Location and Identity Tracking Without Supervision

Authors: Manuel Traub, Sebastian Otte, Tobias Menge, Matthias Karlbauer, Jannik Thuemmel, Martin V. Butz

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As a main result, we observe superior performance on the CATER benchmark: Loci outperforms previous methods by a large margin with an order of magnitude fewer parameters. Additional evaluations on moving MNIST, an aquarium video footage, and on the CLEVRER benchmark underline Loci s contribution towards the self-organized, disentangled identification and localization of objects as well as the effective processing of object interaction dynamics from video data.
Researcher Affiliation Academia Manuel Traub Neuro-Cognitive Modeling University of Tübingen Sebastian Otte Neuro-Cognitive Modeling University of Tübingen Tobias Menge Neuro-Cognitive Modeling University of Tübingen Matthias Karlbauer Neuro-Cognitive Modeling University of Tübingen Jannik Thümmel ML in Climate Science University of Tübingen Martin V. Butz Neuro-Cognitive Modeling University of Tübingen
Pseudocode Yes A detailed algorithmic description is provided in Appendix B. ... Algorithm 1 Loci-Algorithm (main processing loop) ... Algorithm 2 Priority-based-Attention
Open Source Code Yes 1Source Code: https://github.com/CognitiveModeling/Loci
Open Datasets Yes We evaluate Loci mainly on the CATER challenge [32]. ... The moving MNIST (MMNIST) challenge is a dataset for video prediction [77]. ... Additionally, we examine Gestalt preservation and indicators of intuitive physics in closed loop predictions on the CLEVRER dataset [93].
Dataset Splits Yes As described in Girdhar & Ramanan [32], we split the dataset with a ratio of 70:30 into a training and test set and further put aside 20% of the training set as a validation set, leaving 56% of the original data for training.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper mentions "Open CV [9]" but does not specify a version number. It also mentions "rectified Adam optimization [RAdam, cf. 56]" but this is an optimization algorithm, not a specific software dependency with a version.
Experiment Setup Yes Loci is trained using a binary cross-entropy loss (LBCE) pixel-wise on the frame prediction applying rectified Adam optimization [RAdam, cf. 56]. Several regularizing losses are added to foster object permanence. Additionally, to speed-up learning, we use truncated backpropagation trough time and enhance the gradients with an e-prop-like accumulation of previous neural activities [7]. ... Another important aspect for successful training is the use of a warmup phase, where we mask the target of the network with a threshold τ: ... After about 30 000 updates when the network has sufficiently learned to use the position encodings we switch from the masked foreground reconstruction to the full reconstruction. ... To blend-in the supervision loss, we first set the supervision factor µs = 0.01 for the first 4 epochs, fostering mostly unsupervised training, and then set µs = 0.1 for the duration of the training process and to µs = 0.3 during the last epochs, to give the Snitch location a weak pull towards the target location.