Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
Authors: S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, David Szepesvari, koray kavukcuoglu, Geoffrey E. Hinton
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that such models learn to identify multiple objects counting, locating and classifying the elements of a scene without any supervision, e.g., decomposing 3D images with various numbers of objects in a single forward pass of a neural network at unprecedented speed. We further show that the networks produce accurate inferences when compared to supervised counterparts, and that their structure leads to improved generalization. |
| Researcher Affiliation | Industry | S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, David Szepesvari, Koray Kavukcuoglu, Geoffrey E. Hinton {aeslami,heess,theophane,tassa,dsz,korayk,geoffhinton}@google.com Google Deep Mind, London, UK |
| Pseudocode | No | The paper describes the inference and learning processes algorithmically but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper states that 'A video showing real-time inference using AIR has been included in the supplementary material.' however, it does not explicitly mention the release of source code for the described methodology or provide a link to a code repository. |
| Open Datasets | Yes | We demonstrate these capabilities on MNIST digits (Sec. 3.1), overlapping sprites and Omniglot glyphs (appendices H and G). |
| Dataset Splits | No | The paper describes training and testing splits (e.g., 'training on images each containing 0, 1 or 2 digits and then testing on images containing 3 digits' and 'We split this data into training and test'), but does not explicitly detail a separate validation split with specific percentages or counts. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., specific libraries, frameworks, or programming language versions) used in the experiments. |
| Experiment Setup | No | The paper mentions general training configurations such as 'train AIR with N=3' and 'maximising L with respect to the parameters', and 'We train a single-step (N=1) AIR inference network for this task', but does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed optimizer settings. |