ErfAct and Pserf: Non-monotonic Smooth Trainable Activation Functions

Authors: Koushik Biswas, Sandeep Kumar, Shilpak Banerjee, Ashish Kumar Pandey6097-6105

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments suggest that the proposed functions improve the network performance significantly compared to the widely used activations like Re LU, Swish, and Mish.
Researcher Affiliation Academia 1 Department of Computer Science, IIIT Delhi, New Delhi, India 2Department of Mathematics, Shaheed Bhagat Singh College, University of Delhi, New Delhi, India 3Department of Mathematics, IIIT Delhi, New Delhi, India
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper provides a link to an extended version on arXiv (https://arxiv. org/abs/2109.04386), but this is a link to the paper itself, not a specific code repository, and there is no explicit statement about releasing code for the described methodology.
Open Datasets Yes We present a detailed experimental comparison on MNIST (Le Cun, Cortes, and Burges 2010), Fashion MNIST (Xiao, Rasul, and Vollgraf 2017), SVHN (Netzer et al. 2011), CIFAR10 (Krizhevsky and Hinton 2009), CIFAR100 (Krizhevsky and Hinton 2009), Tiny Image Net (Le and Yang 2015), and Image Net-1k (Deng et al. 2009) dataset for image classification problem... We present experimental results on the Cityscapes dataset (Cordts et al. 2016)... Pascal VOC dataset (Everingham et al. 2010)... WMT 2014 English German dataset.
Dataset Splits Yes Tiny Imagenet (Le and Yang 2015) which is a similar type of dataset like ILSVRC and consisting of 200 classes with RGB images of size 64 64 with total 1,00,000 training images, 10,000 validation images, and 10,000 test images.
Hardware Specification Yes An NVIDIA Tesla V100 GPU with 32GB ram is used to run the experiments.
Software Dependencies No The paper mentions optimizers like SGD and Adam, and initializers like He Normal, but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup Yes We have initialized the parameters α = 0.75, β = 0.75 for Erf Act, and γ = 1.25, δ = 0.85 for Pserf... The networks are trained upto 200 epochs with SGD optimizer... 0.9 momentum, and 5e 4 weight decay. We have started with 0.01 initial learning rate and decay the learning rate with cosine annealing... We consider batch size of 128. For Tiny Imagenet: The model is trained with a batch size of 32, He Normal initializer... 0.2 dropout rate... adam optimizer... with initial learning rate(lr rate) 0.01, and lr rate is reduced by a factor of 10 after every 60 epochs up-to 300 epochs.