Fractional Adaptive Linear Units
Authors: Julio Zamora, Anthony D. Rhodes, Lama Nachman8988-8996
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experiments on a variety of conventional tasks and network architectures, we demonstrate the effectiveness of FALUs when compared to traditional and state-of-the-art AFs. |
| Researcher Affiliation | Industry | Intel Labs julio.c.zamora.esquivel@intel.com, anthony.rhodes@intel.com, lama.nachman@intel.com |
| Pseudocode | No | The paper describes mathematical formulas and derivations but does not include a structured pseudocode or algorithm block. |
| Open Source Code | No | To facilitate practical use of this work, we plan to make our code publicly available. |
| Open Datasets | Yes | MNIST (Le Cun and Cortes 2010), Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017), CIFAR-10 (Krizhevsky 2009), Image Net (Deng et al. 2009) |
| Dataset Splits | No | For each dataset we use conventional train/test splits used in literature. MNIST consists of 60,000 (50k/10k train/test split) 28 28 resolution gray scale images in 10 classes, with 6,000 images per class. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'standard automatic differentiation workflows' and 'standard Deep Learning libraries', and 'Adam optimizer (Kingma and Ba 2014)' but does not specify software names with version numbers. |
| Experiment Setup | Yes | For each experiment we used the Adam optimizer (Kingma and Ba 2014) to train our model, and randomly initialized the FALU parameteres in the range α [0, 1] and β [1, 1+ϵ], with ϵ = 0.05;...the FALU function parameters were clamped during training within the domains described previously, i.e., α [0, 2] and β [1, 10]. The model was trained for 120 epochs with an initial learning rate of 0.01 decayed by an order of magnitude every 30 epochs, batch size of 128, and random weight initialization. |