Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking

Authors: Eric Crawford, Joelle Pineau3684-3692

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In a series of experiments, we demonstrate a number of attractive features of our architecture; most notably, that it outperforms competing methods at tracking objects in cluttered scenes with many objects, and that it can generalize well to videos that are larger and/or contain more objects than videos encountered during training.
Researcher Affiliation Academia Eric Crawford Mila/Mc Gill University Montreal, QC, Canada eric.crawford@mail.mcgill.ca Joelle Pineau Mila/Mc Gill University Montreal, QC, Canada jpineau@cs.mcgill.ca
Pseudocode No The paper describes its architecture and processes using natural language and diagrams, but does not include any formal pseudocode or algorithm blocks.
Open Source Code Yes Code for running these experiments is available online at github.com/e2crawfo/silot.
Open Datasets No In this experiment, each 8-frame video is generated by first selecting a number of MNIST digits to include and sampling that number of digits." and "Here each video has spatial size 96 96, and contains a number of moving monochrome shapes. We use 6 different colors and 5 different shapes." No public access information provided for these specific generated datasets.
Dataset Splits No The paper describes training and test conditions (e.g., 'training on videos containing 1–6 digits', 'At test time, we use full 96x96 videos'), but does not explicitly specify training/validation/test splits or their sizes.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions various neural network architectures and algorithms (e.g., 'Variational Autoencoder (VAE)', 'convolutional neural network', 'Spatial Transformer Networks'), but does not specify any software dependencies with version numbers.
Experiment Setup Yes We begin by training on only the first 2 frames of each video. We then increase the number of training frames by 2 every Ncurric update steps." and "Each timestep other than t = 0, the entire discovery module is turned off with probability pdd. Throughout this work we use pdd = 0.5.