reproducibilityindex.ai

Multiset-Equivariant Set Prediction with Approximate Implicit Differentiation

Authors: Yan Zhang, David W Zhang, Simon Lacoste-Julien, Gertjan J. Burghouts, Cees G. M. Snoek

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments in Section 4, we start by evaluating the ﬁrst point by comparing exclusively multiset-equivariant models with both set-equivariant and non-equivariant models on synthetic data. The former completely fail on this simple toy task while the latter are signiﬁcantly less sample-efﬁcient. Next, we evaluate the second point by testing the modeling capacity on autoencoding random sets, where i DSPN performs similarly to DSPN at the same iteration count and much better for the same computational cost. Lastly, i DSPN signiﬁcantly raises the bar on CLEVR object property prediction, outperforming the state-of-the-art Slot Attention (Locatello et al., 2020) by a large margin of 69 percentage points on the AP0.125 metric while only training for 7% of the number of epochs.
Researcher Affiliation	Collaboration	Yan Zhang 1 David W. Zhang 2 Simon Lacoste-Julien1,3,4 Gertjan J. Burghouts5 Cees G. M. Snoek2 Samsung SAIT AI Lab, Montreal1 University of Amsterdam2 Mila, Universit e de Montreal3 Canada CIFAR AI Chair4 TNO5
Pseudocode	No	No pseudocode or clearly labeled algorithm block was found in the paper.
Open Source Code	Yes	and open-source the code to reproduce all experiments at https://github.com/davzha/multiset-equivariance and in the supplementary material.
Open Datasets	Yes	On CLEVR object property prediction, we substantially improve over the state-of-the-art Slot Attention from 8% to 77% in one of the strictest evaluation metrics because of the beneﬁts made possible by implicit differentiation.
Dataset Splits	Yes	The setting with 1 samples has a training dataset of size 640. We additionally use a validation set of size 6,400 and a test set of size 64,000 for every run.
Hardware Specification	Yes	Their model takes around 51 hours to train on a V100 GPU, while our model only takes 2 hours and 40 minutes on a V100 GPU (though with different infrastructure around it).
Software Dependencies	No	No specific version numbers for key software components like PyTorch or TensorFlow were found. The paper mentions "Py Torch (Paszke et al., 2019)" and "Tensorﬂow (Abadi et al., 2015)" and "Weights & Biases tables (Biewald, 2020)", but without version numbers for these software packages.
Experiment Setup	Yes	We train the model for 100 epochs with the Adam optimizer (Kingma & Ba, 2015), a learning rate of 1e-3, and default momentum hyperparameters. ... We increase the batch size from 32 to 128. ... We use Nesterov s Accelerated Gradient (Nesterov, 1983) with a momentum parameter of 0.9 instead of standard gradient descent without momentum. ... Instead of ﬁxing the number of iterations at 10 like DSPN, we set the number of iterations to 20 at the start of training and change it to 40 after 50 epochs. ... We drop the learning rate after 90 epochs from 1e-3 to 1e-4 for the last 10 epochs. ... Clipping the gradients in the inner optimization to a maximum L2 norm of 10 seemed to help.