Generative Pseudo-Inverse Memory
Authors: Kha Pham, Hung Le, Man Ngo, Truyen Tran, Bao Ho, Svetha Venkatesh
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically we demonstrate the efficiency and versatility of GPM on a comprehensive suite of experiments involving binarized MNIST, binarized Omniglot, Fashion MNIST, CIFAR10 & CIFAR100 and Celeb A. |
| Researcher Affiliation | Academia | Kha Pham 1, Hung Le 1, Man Ngo 2, Truyen Tran 1, Bao Ho 3 and Svetha Venkatesh 1 1 Applied Artificial Intelligence Institute, Deakin University 2 Faculty of Mathematics and Computer Science, VNUHCM-University of Science 3 Vietnam Institute for Advanced Study in Mathematics |
| Pseudocode | Yes | Algorithm 1 Single training step of Generative Pseudo-Inverse Memory |
| Open Source Code | Yes | Codes are available at https://github.com/phamtienkha/generative-pseudoinverse-memory. |
| Open Datasets | Yes | We validate these theoretical insights through a comprehensive suite of experiments on binarized MNIST (Le Cun et al., 2010) , binarized Omniglot (Burda et al., 2016), Fashion MNIST (Xiao et al., 2017), CIFAR10 & CIFAR100 (Krizhevsky, 2009) and Celeb A (Liu et al., 2015), demonstrating superior results. |
| Dataset Splits | No | The paper specifies a training and test split for the Omniglot dataset ('24,345 training and 8,070 test examples') but does not mention a validation split. |
| Hardware Specification | No | The paper states 'All operations are computed on a single GPU.' but does not specify the model or type of GPU, CPU, or any other specific hardware details. |
| Software Dependencies | Yes | We use the inverse function of Pytorch 1.8.0 (Paszke et al., 2017) for batch matrix inverse. |
| Experiment Setup | Yes | In all experiments, we use the Adam optimizer with learning rate varying from 5e-5 to 5e-4 depending on the dataset. We use weight decay of 1e-3 along with gradient clipping at threshold 10. The encoder consists of 4 layers, each of which is a convolution layer with 4x4 filter with stride 2 followed by a Resnet block with bottleneck (He et al., 2016). The decoder is simply a mirror of the encoder with transpose convolutional layer. We use the swish activation function (Ramachandran et al., 2017) non-linear layers. We run the Ben-Cohen algorithm for 7 steps to approximate the pseudo-inverses, with the initial term is 10-3 times the transpose of the matrix which we want to calculate the pseudo-inverse. |