Multiset-Equivariant Set Prediction with Approximate Implicit Differentiation
Authors: Yan Zhang, David W Zhang, Simon Lacoste-Julien, Gertjan J. Burghouts, Cees G. M. Snoek
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments in Section 4, we start by evaluating the first point by comparing exclusively multiset-equivariant models with both set-equivariant and non-equivariant models on synthetic data. The former completely fail on this simple toy task while the latter are significantly less sample-efficient. Next, we evaluate the second point by testing the modeling capacity on autoencoding random sets, where i DSPN performs similarly to DSPN at the same iteration count and much better for the same computational cost. Lastly, i DSPN significantly raises the bar on CLEVR object property prediction, outperforming the state-of-the-art Slot Attention (Locatello et al., 2020) by a large margin of 69 percentage points on the AP0.125 metric while only training for 7% of the number of epochs. |
| Researcher Affiliation | Collaboration | Yan Zhang 1 David W. Zhang 2 Simon Lacoste-Julien1,3,4 Gertjan J. Burghouts5 Cees G. M. Snoek2 Samsung SAIT AI Lab, Montreal1 University of Amsterdam2 Mila, Universit e de Montreal3 Canada CIFAR AI Chair4 TNO5 |
| Pseudocode | No | No pseudocode or clearly labeled algorithm block was found in the paper. |
| Open Source Code | Yes | and open-source the code to reproduce all experiments at https://github.com/davzha/multiset-equivariance and in the supplementary material. |
| Open Datasets | Yes | On CLEVR object property prediction, we substantially improve over the state-of-the-art Slot Attention from 8% to 77% in one of the strictest evaluation metrics because of the benefits made possible by implicit differentiation. |
| Dataset Splits | Yes | The setting with 1 samples has a training dataset of size 640. We additionally use a validation set of size 6,400 and a test set of size 64,000 for every run. |
| Hardware Specification | Yes | Their model takes around 51 hours to train on a V100 GPU, while our model only takes 2 hours and 40 minutes on a V100 GPU (though with different infrastructure around it). |
| Software Dependencies | No | No specific version numbers for key software components like PyTorch or TensorFlow were found. The paper mentions "Py Torch (Paszke et al., 2019)" and "Tensorflow (Abadi et al., 2015)" and "Weights & Biases tables (Biewald, 2020)", but without version numbers for these software packages. |
| Experiment Setup | Yes | We train the model for 100 epochs with the Adam optimizer (Kingma & Ba, 2015), a learning rate of 1e-3, and default momentum hyperparameters. ... We increase the batch size from 32 to 128. ... We use Nesterov s Accelerated Gradient (Nesterov, 1983) with a momentum parameter of 0.9 instead of standard gradient descent without momentum. ... Instead of fixing the number of iterations at 10 like DSPN, we set the number of iterations to 20 at the start of training and change it to 40 after 50 epochs. ... We drop the learning rate after 90 epochs from 1e-3 to 1e-4 for the last 10 epochs. ... Clipping the gradients in the inner optimization to a maximum L2 norm of 10 seemed to help. |