Empowering Convolutional Neural Nets with MetaSin Activation
Authors: Farnood Salehi, Tunç Aydin, André Gaillard, Guglielmo Camporese, Yuxuan Wang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experiments where we apply the three step procedure discussed at the end of Section 3. Specifically, we switch an existing architecture s activations to METASIN, and re-train the resulting new model from scratch using KD-Bootstrapping. ... We demonstrate our method in the areas of Monte-Carlo denoising and image resampling where we set new state-of-the-art through a knowledge distillation based training procedure. |
| Researcher Affiliation | Collaboration | Farnood Salehi1 Tunç Ozan Aydın1,3 André Gaillard3 Guglielmo Camporese1,2 Yuxuan Wang3 1Disney Research | Studios 2University of Padova 3ETH Zürich |
| Pseudocode | No | The paper describes the METASIN activation function and training procedure in text and mathematical equations, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'Equation 4 can efficiently be implemented as a module in common deep learning frameworks' and mentions 'Our implementation is based on [46]' for NeRF experiments, but it does not provide an explicit statement about releasing its own source code or a link to a repository for the methodology described in the paper. |
| Open Datasets | Yes | We ran our experiments on the CIFAR100 [23] dataset, which consists of 50K training and 10K validation images with resolution 32 32. ... Our base denoiser network is a U-Net architecture used in [38] and we use the same procedure to generate noisy and reference renderings. |
| Dataset Splits | Yes | The student networks were trained for 300 epochs using standard cross-entropy loss following the knowledge distillation methodology discussed in [36]. We ran our experiments on the CIFAR100 [23] dataset, which consists of 50K training and 10K validation images with resolution 32 32. |
| Hardware Specification | Yes | In Table 1 we compare the latency induced by the native Py Torch implementation against our customized METASIN operator with fused CUDA kernel functions. ... Table 1: Latency of executing a METASIN activation with K = 10 using native vs. efficient implementations, relative to the latency of executing a single RELU activation on Nvidia RTX 3090. |
| Software Dependencies | No | The paper mentions 'Py Torch' and 'Tensorflow' as common deep learning frameworks and discusses CUDA kernels, but it does not specify exact version numbers for these software dependencies (e.g., PyTorch 1.x, CUDA 11.x). |
| Experiment Setup | Yes | In order to train convolutional METASIN networks, we initialize the shape parameters as c0 = 1, c[l] j = 0, f [l] j = j, and p[l] j = U(0, π), for j [1, K], and l [1, L] by default. ... KD-Bootstrapping, which comprises approximately 5 10% of the total training iterations for the METASIN network. ... Our best METASIN network is configured with K = 5 and the frequencies are initialized as f [l] j = j/2. We use KD-bootsrapping during the first 5% of the training procedure. ... The student networks were trained for 300 epochs using standard cross-entropy loss following the knowledge distillation methodology discussed in [36]. ... In this experiment, we train the networks to up-sample the image reported in Figure 8, with an up-scale of 16 for 100, 000 iterations using the Adam optimizer with a learning rate of 10 5. |