HyperGAN: A Generative Model for Diverse, Performant Neural Networks

Authors: Neale Ratzlaff, Li Fuxin

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a variety of experiments to test Hyper GAN s ability to achieve both high accuracy and obtain accurate uncertainty estimates. First we show classification performance on both MNIST and CIFAR-10 datasets.
Researcher Affiliation Academia 1School of Electrical Engineering and Computer Science, Oregon State University. Correspondence to: Neale Ratzlaff <ratzlafn@oregonstate.edu>, Li Fuxin <lif@oregonstate.edu>.
Pseudocode No The paper provides architectural diagrams (Figure 1) and mathematical formulations, but it does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code or provide a link to a code repository.
Open Datasets Yes First we show classification performance on both MNIST and CIFAR-10 datasets.
Dataset Splits Yes For MNIST experiments we train a Hyper GAN on the MNIST dataset, and test on the not MNIST dataset: a 10-class set of 28x28 grayscale images depicting the letters A J. In this setting, we want the softmax probabilities on inlier MNIST examples to have minimum entropy a single large activation close to 1. On out-of-distribution data we want to have equal probability across predictions. Similarly, we test our CIFAR-10 model by training on the first 5 classes, and using the latter 5 classes as out of distribution examples.
Hardware Specification No The paper mentions memory usage and the type of computing unit: 'We trained our Hyper GAN on MNIST using less than 1.5GB of memory on a single GPU, while CIFAR-10 used just 4GB, making Hyper GAN surprisingly scalable.' However, it does not specify the exact model of the GPU or any other hardware components like CPU or RAM.
Software Dependencies No The paper describes the models and methods used but does not specify any software libraries, frameworks, or their version numbers that were used for implementation or experimentation.
Experiment Setup Yes For MNIST experiments, Hyper GAN has weight generators, each taking a latent vector z Q(z|s) R128 as input. The target network for the MNIST experiments is a small two layer convolutional network followed by 1 fully-connected layer, using leaky Re LU activations and 2x2 max pooling after each convolutional layer. For CIFAR-10, we use 5 weight generators with latent codes z Q(z|s) R256. The target architecture for CIFAR-10 consists of three convolutional layers, each followed by leaky Re LU and 2x2 max pooling, followed by 2 fully connected layers. The mixer, generators, and discriminator are each 2 layer MLPs with 512 units in each layer and Re LU nonlinearity.