Deep Learning with Sets and Point Clouds

Authors: Siamak Ravanbakhsh, Jeff Schneider, Barnabas Poczos

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use deep permutation-invariant networks to perform point-could classification and MNIST-digit summation, where in both cases the output is invariant to permutations of the input. In a semi-supervised setting, where the goal is make predictions for each instance within a set, we demonstrate the usefulness of this type of layer in set-outlier detection as well as semi-supervised learning with clustering sideinformation. Figure 1: Classification accuracy of different schemes in predicting the sum of a (left) N=3 and (right) N=6 MNIST digits without access to individual image labels. The training set was fixed to 10,000 sets. Table 1: Classification accuracy and the (size of) representation used by different methods on the Model Net40 dataset.
Researcher Affiliation Academia Siamak Ravanbakhsh, Jeff Schneider & Barnab as P oczos School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA {mravanba,jeff.schneider,bapoczos}@cs.cmu.edu
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code No No statement or link providing concrete access to the source code for the methodology described in this paper was found. The paper mentions using TensorFlow, but not the release of their own implementation code.
Open Datasets Yes MNIST dataset (Le Cun et al., 1998), Celeb A dataset (Liu et al., 2015) contains 202,599 face images, Model Net40 (Wu et al., 2015), red Ma PPer galaxy cluster catalog (Rozo & Rykoff, 2014)
Dataset Splits Yes We randomly sample a subset of N images from this dataset to build 10,000 sets of training and 10,000 sets of validation images, where the set-label is the sum of digits in that set (i.e., individual labels per image is unavailable).
Hardware Specification No No specific hardware details (like GPU/CPU models, memory, or specific cloud instances) used for running the experiments were provided in the paper.
Software Dependencies No All our implementations use Tensorflow (Abadi et al., 2016).
Experiment Setup Yes All models have 4 convolution layers followed by max-pooling. The convolution layers have respectively 16-32-64-128 output channels and 5 5 receptive fields. Each pooling, fully-connected and set-layer is followed by a 20% dropout. For optimization, we used a learning rate of .0003 with Adam using the default β1 = .9 and β2 = .999. Our model has 9 convolution layers with 3 3 receptive fields. The model has convolution layers with 32, 32, 64 feature-maps followed by max-pooling followed by 2D convolution layers with 64, 64, 128 feature-maps followed by another max-pooling layer. The final set of convolution layers have 128, 128, 256 feature-maps, followed by a max-pooling layer with pool-size of 5 that reduces the output dimension to batch size.N 256, where the set-size N = 16. This is then forwarded to three permutation-equivariant layers with 256, 128 and 1 output channels. We use exponential linear units (Clevert et al., 2015), drop out with 20% dropout rate at convolutional layers and 50% dropout rate at the first two set layers. When applied to set layers, the selected feature (channel) is simultaneously dropped in all the set members of that particular set. We use Adam (Kingma & Ba, 2014) for optimization and use batch-normalization only in the convolutional layers. We use mini-batches of 8 sets, for a total of 128 images per batch. We use four permutation-equivariant layers with 128, 128, 128 and 1 output channels respectively, where the output of the last layer is used as red-shift estimate. The squared loss of the prediction for available spectroscopic red-shifts is minimized.6We use mini-batches of size 128, Adam (Kingma & Ba, 2014), with learning rate of .001, β1 = .9 and β2 = .999. All layers except for the last layer use Tanh units and simultaneous dropout with 50% dropout rate.