Differential Equation Units: Learning Functional Forms of Activation Functions from Data
Authors: MohamadAli Torkamani, Shiv Shankar, Amirmohammad Rooshenas, Phillip Wallis6030-6037
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have conducted several experiments to evaluate the performance and compactness of DEU networks. We evaluate DEU on different models considering the classification performance and model size. We first use MNIST and Fashion-MNIST as our datasets... |
| Researcher Affiliation | Collaboration | Mohamad Ali Torkamani,1 Shiv Shankar,2 Amirmohammad Rooshenas,2 Phillip Wallis3 1Amazon.com 2University of Massachusetts Amherst 3Microsoft Dynamics 365 AI |
| Pseudocode | Yes | Algorithm 1 Parallelized DEU 1: procedure DEU(input) 2: output 0 3: for each singularity space S do 4: mask = 1[n S] n neurons 5: if mask > 0 then 6: output output + mask f S(input) 7: end if 8: end for 9: Return output 10: end procedure |
| Open Source Code | Yes | The code is available at https://github.com/rooshenas/deu |
| Open Datasets | Yes | We first use MNIST and Fashion-MNIST as our datasets to assess the behavior of DEUs with respect to the commonly used Re LU activation function, as well as Maxout and SELU. ... on the CIFAR-10 dataset. ... a standard diabetes regression dataset.3 (https://www4.stat.ncsu.edu/ boos/var.select/diabetes.html) |
| Dataset Splits | Yes | We use 3-fold cross validation and report the average performance. |
| Hardware Specification | Yes | We evaluate computation time of Resnet-18 models with DEUs and Re LUs on CIFAR-10 using a batch size of 128 on a Tesla K40c GPU. ... The training is done using Geforce GTX TITAN. |
| Software Dependencies | Yes | We have solved the equations and taken their derivatives using the software package Maple (2018). |
| Experiment Setup | Yes | The neural network is a 2-layer MLP with 1024 and 512 dimensional hidden layers. While the CNN used is a 2-layer model made by stacking 32 and 16 dimensional convolutional filters atop one another followed by average pooling. ... For these experiments, we keep the network architecture fixed to Res Net-18 (He et al. 2016a) and use the hyperparameter settings as in He et al. (2016a). ... We initialize parameters a, b, c for all neurons with a random positive number less than one, and strictly greater than zero. We initialize c1 = c2 = 0.0. ... Both the weights w, as well as θ are learned using the conventional backpropagation algorithm with Adam updates (Kingma and Ba 2014). ... using a batch size of 128 ... The batch size is 256 |