Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Globally Injective ReLU Networks
Authors: Michael Puthawala, Konik Kothari, Matti Lassas, Ivan Dokmanić, Maarten de Hoop
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Numerical Experiments. In this section we present numerical results. First we numerically investigate the results of Theorem 7, and then show that injective GANs improve inference. 4.1 Experiments Testing the Minimal Expansivity Threshold... 4.2 Injective GANs Improve Inference. We use a single Re LU layer, f(x) = Re LU(Wx), and choose x to be an MNIST digit for ease of visualization of the latent space. We choose W Rcn n with iid Gaussian entries and set c = m/n = 2.1. We devise a simple experiment to demonstrate that 1) one can construct injective networks using Corollary 6 that 2) perform as well as the corresponding non-injective networks, while 3) being better at inference problems. We test on Celeb A (Liu et al., 2015) and FFHQ (Karras et al., 2019) data sets. To get a performance metric, we fit Gaussian distributions N(µ, Σ) and N(µinj, Σinj) to G(z) and Ginj(z). We then compute the Wasserstein-2 distance W2 between the distribution of z N(0, I256) and the two fitted Gaussians using the closed-form expression for W2 between Gaussians, W2 2(N(µ1, Σ1), N(µ2, Σ2)) = µ1 µ2 2 + Tr(Σ1 + Σ2 2(Σ1Σ2)1/2). We summarize the results in Table 1. Despite the restrictions on the weights of injective generators, their performance on popular GAN metrics Fréchet inception distance (FID) (Heusel et al., 2017) and inception score (IS) (Salimans et al., 2016) is comparable to the standard GAN while inference improves. |
| Researcher Affiliation | Academia | Michael Puthawala EMAIL Department of Computational and Applied Mathematics Rice University, MS-134 6100 Main St. Houston, TX, 77005, USA Konik Kothari EMAIL Department of Computer Science University of Illinois at Urbana-Champaign Champaign, IL, 61820, USA Matti Lassas EMAIL Department of Mathematics and Statistics University of Helsinki P.O. Box 68, FI-00014 University of Helsinki, Finland Ivan Dokmanić EMAIL Department of Mathematics and Computer Science University of Basel 4051 Basel, Switzerland Maarten de Hoop EMAIL Department of Computational and Applied Mathematics Rice University, MS-134 6100 Main St. Houston, TX, 77005, USA |
| Pseudocode | No | The paper focuses on theoretical analysis, proofs, and numerical experiments based on existing architectures. It does not include any clearly labeled pseudocode or algorithm blocks describing a method or procedure. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the described methodology or provide links to code repositories. |
| Open Datasets | Yes | We choose x to be an MNIST digit for ease of visualization of the latent space... We test on Celeb A (Liu et al., 2015) and FFHQ (Karras et al., 2019) data sets. |
| Dataset Splits | No | We train for 40 epochs on a data set of size 80000 samples. We use a batch size of 64 and Adam optimizer for training with learning rate of 10^-4. We report FID (Heusel et al., 2017) and Inception score (Salimans et al., 2016) using 10000 generated samples. The standard deviation was calculated using 5 sets of 10000 generated samples. In order to calculate the mean and covariance of generated distributions, we sample 50000 codes. The paper mentions the total number of samples and generated samples for evaluation but does not specify the train/test/validation splits for the input datasets used. |
| Hardware Specification | No | The paper's 'Numerical Experiments' section and its 'Architecture Details for Experiments' appendix describe the experimental setup, model architectures, and training parameters, but do not mention any specific hardware (e.g., GPU/CPU models, processor types, or memory) used for running the experiments. |
| Software Dependencies | No | We use the Wasserstein loss with gradient penalty (Gulrajani et al., 2017) to train our networks. We train for 40 epochs... We use a batch size of 64 and Adam optimizer for training with learning rate of 10^-4. While the paper mentions the Adam optimizer, it does not specify versions for any key software components, libraries, or programming languages used (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Generator network: We train a generator with 5 convolutional layers. The input latent code is 256-dimensional which is treated by the network as a 1x1x256 size tensor. The first layer is a transposed convolution with a kernel size of 4x4 with stride 1 and 1024 output channels. This is followed by a leaky Re LU. We follow this up by 3 conv layers each of which halve the number of channels and double the image size (that is we go from N/2 N/2 C to N N C/2 tensor) giving an expansivity of two, the minimum required for injectivity of Re LU networks. Each of these 3 convolution layers has kernel size 3, stride 2 and is followed by the Re LU activation. These layers are made injective by having half the filters as w and the other half as s2w. Here, w and s are trainable parameters. The biases in these layers are kept at zero. We do not employ any normalization schemes. Lastly, we have a convolution layer at the end to get to 3 channels and required image size. This layer is followed by the sigmoidal activation. Critic network: The discriminator has 5 convolution layers with 128, 256, 512, 1024 and 1 channels per layer. Each convolution layer has 4x4 kernels with stride 2. Each layer is followed by the leaky-Re LU activation function. The last layer of the network is followed by identity. Inference network: The inference network has the same architecture as the first 4 convolution layers of the discriminator. This is followed by 3 fully-connected layers of size 512, 256 and 256. The first 2 fully-connected layers have a Leaky Re LU activation while the last layer has identity activation function. The inference net is trained in tandem with the GAN. We use the Wasserstein loss with gradient penalty (Gulrajani et al., 2017) to train our networks. We train for 40 epochs on a data set of size 80000 samples. We use a batch size of 64 and Adam optimizer for training with learning rate of 10^-4. |