FSPool: Learning Set Representations with Featurewise Sort Pooling

Authors: Yan Zhang, Jonathon Hare, Adam Prügel-Bennett

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a toy dataset of polygons and a set version of MNIST, we show that such an auto-encoder produces considerably better reconstructions and representations. Replacing the pooling function in existing set encoders with FSPool improves accuracy and convergence speed on a variety of datasets.
Researcher Affiliation Academia Yan Zhang University of Southampton Southampton, UK yz5n12@ecs.soton.ac.uk Jonathon Hare University of Southampton Southampton, UK jsh2@ecs.soton.ac.uk Adam Prügel-Bennett University of Southampton Southampton, UK apb@ecs.soton.ac.uk
Pseudocode No The paper describes algorithms using mathematical equations and prose but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Full results can be found in the appendices, experimental details can be found in Appendix H, and we provide our code for reproducibility at [redacted].
Open Datasets Yes Next, we turn to the harder task of auto-encoding MNIST images turned into sets of points... CLEVR (Johnson, 2017) is a visual question answering dataset... We perform a large number of experiments on various graph classification datasets from the TU repository (Kersting et al., 2016)...
Dataset Splits Yes We repeat 10-fold cross-validation on each dataset 10 times... The best hyperparameters are selected based on best average validation accuracy across the 10-fold cross-validation, where one of the 9 training folds is used as validation set each time.
Hardware Specification Yes Training the FSPool model takes 45 seconds per epoch on a GTX 1080 GPU, only slightly more than the baselines with 37 seconds per epoch.
Software Dependencies No The paper mentions 'Py Torch' and 'torch-geometric library (Fey et al., 2018)' but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We use a batch size of 16 for all three models and train it for 10240 steps. We use the Adam optimiser (Kingma & Ba, 2015) with 0.001 learning rate and their suggested values for the other optimiser parameters (Py Torch defaults). Weights of linear and convolutional layers are initialised as suggested in Glorot & Bengio (2010). The size of every hidden layer is set to 16 and the latent space is set to 1 (it should only need to store the rotation as latent variable).