Tagger: Deep Unsupervised Perceptual Grouping

Authors: Klaus Greff, Antti Rasmus, Mathias Berglund, Tele Hao, Harri Valpola, Jürgen Schmidhuber

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on multi-digit classification of very cluttered images that require texture segmentation. Remarkably our method achieves improved classification performance over convolutional networks despite being fully connected, by making use of the grouping mechanism. Furthermore, we observe that our system greatly improves upon the semi-supervised result of a baseline Ladder network on our dataset.
Researcher Affiliation Collaboration The Curious AI Company {antti,mathias,hotloo,harri}@cai.fi *IDSIA {klaus,juergen}@idsia.ch
Pseudocode Yes Algorithm 1: Pseudocode for running Tagger on a single real-valued example x.
Open Source Code Yes The datasets and a Theano [33] reference implementation of Tagger are available at http://github.com/ Curious AI/tagger
Open Datasets Yes The datasets and a Theano [33] reference implementation of Tagger are available at http://github.com/ Curious AI/tagger
Dataset Splits Yes It consists of 60,000 (train) + 10,000 (test) binary images of size 20x20. (...) We use a 50k training set, 10k validation set, and 10k test set to report the results.
Hardware Specification Yes The models reported in this paper took approximately 3 and 11 hours in wall clock time on a single Nvidia Titan X GPU for Shapes and Texture MNIST2 datasets respectively.
Software Dependencies No The paper mentions 'a Theano [33] reference implementation of Tagger' but does not specify a version number for Theano or any other key software dependencies used for the experiments.
Experiment Setup Yes We train Tagger in an unsupervised manner by only showing the network the raw input example x, not ground truth masks or any class labels, using 4 groups and 3 iterations. We average the cost over iterations and use ADAM [14] for optimization. On the Shapes dataset we trained for 100 epochs with a bit-flip probability of 0.2, and on the Texture MNIST dataset for 200 epochs with a corruption-noise standard deviation of 0.2.