Differential Equation Units: Learning Functional Forms of Activation Functions from Data

Authors: MohamadAli Torkamani, Shiv Shankar, Amirmohammad Rooshenas, Phillip Wallis6030-6037

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We have conducted several experiments to evaluate the performance and compactness of DEU networks. We evaluate DEU on different models considering the classification performance and model size. We first use MNIST and Fashion-MNIST as our datasets...
Researcher Affiliation Collaboration Mohamad Ali Torkamani,1 Shiv Shankar,2 Amirmohammad Rooshenas,2 Phillip Wallis3 1Amazon.com 2University of Massachusetts Amherst 3Microsoft Dynamics 365 AI
Pseudocode Yes Algorithm 1 Parallelized DEU 1: procedure DEU(input) 2: output 0 3: for each singularity space S do 4: mask = 1[n S] n neurons 5: if mask > 0 then 6: output output + mask f S(input) 7: end if 8: end for 9: Return output 10: end procedure
Open Source Code Yes The code is available at https://github.com/rooshenas/deu
Open Datasets Yes We first use MNIST and Fashion-MNIST as our datasets to assess the behavior of DEUs with respect to the commonly used Re LU activation function, as well as Maxout and SELU. ... on the CIFAR-10 dataset. ... a standard diabetes regression dataset.3 (https://www4.stat.ncsu.edu/ boos/var.select/diabetes.html)
Dataset Splits Yes We use 3-fold cross validation and report the average performance.
Hardware Specification Yes We evaluate computation time of Resnet-18 models with DEUs and Re LUs on CIFAR-10 using a batch size of 128 on a Tesla K40c GPU. ... The training is done using Geforce GTX TITAN.
Software Dependencies Yes We have solved the equations and taken their derivatives using the software package Maple (2018).
Experiment Setup Yes The neural network is a 2-layer MLP with 1024 and 512 dimensional hidden layers. While the CNN used is a 2-layer model made by stacking 32 and 16 dimensional convolutional filters atop one another followed by average pooling. ... For these experiments, we keep the network architecture fixed to Res Net-18 (He et al. 2016a) and use the hyperparameter settings as in He et al. (2016a). ... We initialize parameters a, b, c for all neurons with a random positive number less than one, and strictly greater than zero. We initialize c1 = c2 = 0.0. ... Both the weights w, as well as θ are learned using the conventional backpropagation algorithm with Adam updates (Kingma and Ba 2014). ... using a batch size of 128 ... The batch size is 256