Generative Pseudo-Inverse Memory

Authors: Kha Pham, Hung Le, Man Ngo, Truyen Tran, Bao Ho, Svetha Venkatesh

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically we demonstrate the efficiency and versatility of GPM on a comprehensive suite of experiments involving binarized MNIST, binarized Omniglot, Fashion MNIST, CIFAR10 & CIFAR100 and Celeb A.
Researcher Affiliation Academia Kha Pham 1, Hung Le 1, Man Ngo 2, Truyen Tran 1, Bao Ho 3 and Svetha Venkatesh 1 1 Applied Artificial Intelligence Institute, Deakin University 2 Faculty of Mathematics and Computer Science, VNUHCM-University of Science 3 Vietnam Institute for Advanced Study in Mathematics
Pseudocode Yes Algorithm 1 Single training step of Generative Pseudo-Inverse Memory
Open Source Code Yes Codes are available at https://github.com/phamtienkha/generative-pseudoinverse-memory.
Open Datasets Yes We validate these theoretical insights through a comprehensive suite of experiments on binarized MNIST (Le Cun et al., 2010) , binarized Omniglot (Burda et al., 2016), Fashion MNIST (Xiao et al., 2017), CIFAR10 & CIFAR100 (Krizhevsky, 2009) and Celeb A (Liu et al., 2015), demonstrating superior results.
Dataset Splits No The paper specifies a training and test split for the Omniglot dataset ('24,345 training and 8,070 test examples') but does not mention a validation split.
Hardware Specification No The paper states 'All operations are computed on a single GPU.' but does not specify the model or type of GPU, CPU, or any other specific hardware details.
Software Dependencies Yes We use the inverse function of Pytorch 1.8.0 (Paszke et al., 2017) for batch matrix inverse.
Experiment Setup Yes In all experiments, we use the Adam optimizer with learning rate varying from 5e-5 to 5e-4 depending on the dataset. We use weight decay of 1e-3 along with gradient clipping at threshold 10. The encoder consists of 4 layers, each of which is a convolution layer with 4x4 filter with stride 2 followed by a Resnet block with bottleneck (He et al., 2016). The decoder is simply a mirror of the encoder with transpose convolutional layer. We use the swish activation function (Ramachandran et al., 2017) non-linear layers. We run the Ben-Cohen algorithm for 7 steps to approximate the pseudo-inverses, with the initial term is 10-3 times the transpose of the matrix which we want to calculate the pseudo-inverse.