Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking
Authors: Eric Crawford, Joelle Pineau3684-3692
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In a series of experiments, we demonstrate a number of attractive features of our architecture; most notably, that it outperforms competing methods at tracking objects in cluttered scenes with many objects, and that it can generalize well to videos that are larger and/or contain more objects than videos encountered during training. |
| Researcher Affiliation | Academia | Eric Crawford Mila/Mc Gill University Montreal, QC, Canada eric.crawford@mail.mcgill.ca Joelle Pineau Mila/Mc Gill University Montreal, QC, Canada jpineau@cs.mcgill.ca |
| Pseudocode | No | The paper describes its architecture and processes using natural language and diagrams, but does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code for running these experiments is available online at github.com/e2crawfo/silot. |
| Open Datasets | No | In this experiment, each 8-frame video is generated by first selecting a number of MNIST digits to include and sampling that number of digits." and "Here each video has spatial size 96 96, and contains a number of moving monochrome shapes. We use 6 different colors and 5 different shapes." No public access information provided for these specific generated datasets. |
| Dataset Splits | No | The paper describes training and test conditions (e.g., 'training on videos containing 1–6 digits', 'At test time, we use full 96x96 videos'), but does not explicitly specify training/validation/test splits or their sizes. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions various neural network architectures and algorithms (e.g., 'Variational Autoencoder (VAE)', 'convolutional neural network', 'Spatial Transformer Networks'), but does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | We begin by training on only the first 2 frames of each video. We then increase the number of training frames by 2 every Ncurric update steps." and "Each timestep other than t = 0, the entire discovery module is turned off with probability pdd. Throughout this work we use pdd = 0.5. |