Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects
Authors: Adam Kosiorek, Hyunjik Kim, Yee Whye Teh, Ingmar Posner
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate SQAIR on two datasets. Firstly, we perform an extensive evaluation on moving MNIST digits, where we show that it can learn to reliably detect, track and generate moving digits (Section 4.1). Moreover, we show that SQAIR can simulate moving objects into the future an outcome it has not been trained for. We also study the utility of learned representations for a downstream task. Secondly, we apply SQAIR to real-world pedestrian CCTV data from static cameras (Duke MTMC, Ristani et al., 2016), where we perform background subtraction as pre-processing. The quantitative analysis consists of comparing all models in terms of the marginal log-likelihood log pθ(x1:T ) evaluated as the LIWAE bound with K = 1000 particles, reconstruction quality evaluated as a single-sample approximation of Eqφ[log pθ(x1:T | z1:T )] and the KL-divergence between the approximate posterior and the prior (Table 1). |
| Researcher Affiliation | Academia | Adam R. Kosiorek Hyunjik Kim Ingmar Posner Yee Whye Teh Applied Artificial Intelligence Lab Oxford Robotics Institute University of Oxford Department of Statistics University of Oxford Corresponding author: adamk@robots.ox.ac.uk |
| Pseudocode | Yes | For details, see Algorithms 2 and 3 in Appendix A. |
| Open Source Code | Yes | Code for the implementation on the MNIST dataset2 and the results video3 are available online. 2code: github.com/akosiorek/sqair |
| Open Datasets | Yes | We evaluate SQAIR on two datasets. Firstly, we perform an extensive evaluation on moving MNIST digits... Secondly, we apply SQAIR to real-world pedestrian CCTV data from static cameras (Duke MTMC, Ristani et al., 2016) |
| Dataset Splits | Yes | There are 60,000 training and 10,000 testing sequences created from the respective MNIST datasets. In this experiment, we use 3150 training and 350 validation sequences of length 5. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments are provided in the paper. |
| Software Dependencies | No | The paper mentions optimizers (RMSPROP) and techniques (IWAE, VIMCO), but does not provide specific software library names with version numbers (e.g., TensorFlow 2.x, PyTorch 1.x) or programming language versions. |
| Experiment Setup | Yes | To optimise the above, we use RMSPROP, K = 5 and batch size of 32. |