Learning Representations of Sets through Optimized Permutations

Authors: Yan Zhang, Jonathon Hare, Adam Prügel-Bennett

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In four different experiments, we show improvements over existing methods (section 4). on which we achieve state-of-the-art results: number sorting, image mosaics, classification from image mosaics, and visual question answering.
Researcher Affiliation Academia Yan Zhang & Adam Pr ugel-Bennett & Jonathon Hare Department of Electronics and Computer Science University of Southampton {yz5n12,apb,jsh2}@ecs.soton.ac.uk
Pseudocode Yes A PSEUDOCODE OF ALGORITHM
Open Source Code Yes Precise experimental details can be found in Appendix F and our implementation for all experiments is available at https: //github.com/Cyanogenoid/perm-optim for full reproducibility.
Open Datasets Yes We take these images from either MNIST, CIFAR10, or a version of Image Net with images resized down to 64 64 pixels. We use the VQA v2 dataset (Antol et al., 2015; Goyal et al., 2017).
Dataset Splits Yes We use the VQA v2 dataset (Antol et al., 2015; Goyal et al., 2017), which in total contains around 1 million questions about 200,000 images from MS-COCO with 6.5 million human-provided answers available for training. Our results on the validation set of VQA v2 are shown in Table 3.
Hardware Specification No The paper mentions "GPU memory requirements" but does not specify any particular GPU model, CPU, or other hardware components used for running experiments.
Software Dependencies No All of our experiments can be reproduced using our implementation at https:// github.com/Cyanogenoid/perm-optim in Py Torch (Paszke et al., 2017)
Experiment Setup Yes All of our experiments can be reproduced using our implementation at https:// github.com/Cyanogenoid/perm-optim in Py Torch (Paszke et al., 2017) through the experiments/all.sh script. For the former three experiments, we use the following hyperparameters throughout: Optimiser: Adam (Kingma & Ba, 2015) (default settings in Py Torch: β1 = 0.9, β2 = 0.999, ϵ = 10 8) Initial step size η in inner gradient descent: 1.0 Inner gradient descent steps T: 6 Adam learning rate: 0.1 Batch size: 512 Number of sets to sort in training set: 218 Adam learning rate: 10 3 Inner gradient descent steps T: 4 Batch size: 32 Training epochs: 20 (MNIST, CIFAR10) or 1 (Image Net) F size of hidden dimension: 64 (MNIST, CIFAR10) or 128 (Image Net).