reproducibilityindex.ai

Differential Equation Units: Learning Functional Forms of Activation Functions from Data

Authors: MohamadAli Torkamani, Shiv Shankar, Amirmohammad Rooshenas, Phillip Wallis6030-6037

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have conducted several experiments to evaluate the performance and compactness of DEU networks. We evaluate DEU on different models considering the classiﬁcation performance and model size. We ﬁrst use MNIST and Fashion-MNIST as our datasets...
Researcher Affiliation	Collaboration	Mohamad Ali Torkamani,1 Shiv Shankar,2 Amirmohammad Rooshenas,2 Phillip Wallis3 1Amazon.com 2University of Massachusetts Amherst 3Microsoft Dynamics 365 AI
Pseudocode	Yes	Algorithm 1 Parallelized DEU 1: procedure DEU(input) 2: output 0 3: for each singularity space S do 4: mask = 1[n S] n neurons 5: if mask > 0 then 6: output output + mask f S(input) 7: end if 8: end for 9: Return output 10: end procedure
Open Source Code	Yes	The code is available at https://github.com/rooshenas/deu
Open Datasets	Yes	We ﬁrst use MNIST and Fashion-MNIST as our datasets to assess the behavior of DEUs with respect to the commonly used Re LU activation function, as well as Maxout and SELU. ... on the CIFAR-10 dataset. ... a standard diabetes regression dataset.3 (https://www4.stat.ncsu.edu/ boos/var.select/diabetes.html)
Dataset Splits	Yes	We use 3-fold cross validation and report the average performance.
Hardware Specification	Yes	We evaluate computation time of Resnet-18 models with DEUs and Re LUs on CIFAR-10 using a batch size of 128 on a Tesla K40c GPU. ... The training is done using Geforce GTX TITAN.
Software Dependencies	Yes	We have solved the equations and taken their derivatives using the software package Maple (2018).
Experiment Setup	Yes	The neural network is a 2-layer MLP with 1024 and 512 dimensional hidden layers. While the CNN used is a 2-layer model made by stacking 32 and 16 dimensional convolutional ﬁlters atop one another followed by average pooling. ... For these experiments, we keep the network architecture ﬁxed to Res Net-18 (He et al. 2016a) and use the hyperparameter settings as in He et al. (2016a). ... We initialize parameters a, b, c for all neurons with a random positive number less than one, and strictly greater than zero. We initialize c1 = c2 = 0.0. ... Both the weights w, as well as θ are learned using the conventional backpropagation algorithm with Adam updates (Kingma and Ba 2014). ... using a batch size of 128 ... The batch size is 256