Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects

Authors: Adam Kosiorek, Hyunjik Kim, Yee Whye Teh, Ingmar Posner

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate SQAIR on two datasets. Firstly, we perform an extensive evaluation on moving MNIST digits, where we show that it can learn to reliably detect, track and generate moving digits (Section 4.1). Moreover, we show that SQAIR can simulate moving objects into the future an outcome it has not been trained for. We also study the utility of learned representations for a downstream task. Secondly, we apply SQAIR to real-world pedestrian CCTV data from static cameras (Duke MTMC, Ristani et al., 2016), where we perform background subtraction as pre-processing. The quantitative analysis consists of comparing all models in terms of the marginal log-likelihood log pθ(x1:T ) evaluated as the LIWAE bound with K = 1000 particles, reconstruction quality evaluated as a single-sample approximation of Eqφ[log pθ(x1:T | z1:T )] and the KL-divergence between the approximate posterior and the prior (Table 1).
Researcher Affiliation Academia Adam R. Kosiorek Hyunjik Kim Ingmar Posner Yee Whye Teh Applied Artificial Intelligence Lab Oxford Robotics Institute University of Oxford Department of Statistics University of Oxford Corresponding author: adamk@robots.ox.ac.uk
Pseudocode Yes For details, see Algorithms 2 and 3 in Appendix A.
Open Source Code Yes Code for the implementation on the MNIST dataset2 and the results video3 are available online. 2code: github.com/akosiorek/sqair
Open Datasets Yes We evaluate SQAIR on two datasets. Firstly, we perform an extensive evaluation on moving MNIST digits... Secondly, we apply SQAIR to real-world pedestrian CCTV data from static cameras (Duke MTMC, Ristani et al., 2016)
Dataset Splits Yes There are 60,000 training and 10,000 testing sequences created from the respective MNIST datasets. In this experiment, we use 3150 training and 350 validation sequences of length 5.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments are provided in the paper.
Software Dependencies No The paper mentions optimizers (RMSPROP) and techniques (IWAE, VIMCO), but does not provide specific software library names with version numbers (e.g., TensorFlow 2.x, PyTorch 1.x) or programming language versions.
Experiment Setup Yes To optimise the above, we use RMSPROP, K = 5 and batch size of 32.