Periodic Activation Functions Induce Stationarity
Authors: Lassi Meronen, Martin Trapp, Arno Solin
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In a series of experiments, we show that periodic activation functions obtain comparable performance for in-domain data and capture sensitivity to perturbed inputs in deep neural networks for out-of-domain detection. 4 Experiments To assess the performance of stationarity-inducing activation functions, we compared the in-domain and out-of-domain predictive uncertainty to non-stationary and locally stationary models on various benchmark data sets. In all experiments, the compared models differ only by the activation function and the respective weight priors of the last hidden layer of the network. |
| Researcher Affiliation | Collaboration | Lassi Meronen Aalto University / Saab Finland Oy Espoo, Finland lassi.meronen@aalto.fi Martin Trapp Aalto University Espoo, Finland martin.trapp@aalto.fi Arno Solin Aalto University Espoo, Finland arno.solin@aalto.fi |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The codes and data to replicate the results are available under MIT license at https://github. com/Aalto ML/Periodic BNN. |
| Open Datasets | Yes | Illustrative toy examples Fig. 1 shows predictive densities for non-stationary, locally stationary, and globally stationary activation functions on the banana classification task. UCI benchmarks Table 1 shows results on UCI [8] regression data sets comparing deep neural networks with Re LU, locally stationary RBF [60], and locally stationary Matérn-3/2 [34] against global stationary models. Detection of distribution shift with rotated MNIST Fig. 5 demonstrates the predictive behaviour of different activation functions under covariate shift. We evaluate the predictive accuracy, mean confidence of the predicted class, and NLPD on MNIST [25] for different rotations of the input. Out-of-distribution detection using CIFAR-10, CIFAR-100, and SVHN Fig. 6 shows model performance on OOD detection in image classification for Re LU, locally stationary Matérn-3/2, and globally stationary Matérn-3/2 models. The models have been trained using SWAG [29] on CIFAR-10 [23], and tested on CIFAR-10, CIFAR-100 [23], and SVHN [39] test set images. |
| Dataset Splits | Yes | Table 1 lists root mean square error (RMSE) and negative log predictive density (NLPD), which captures the predictive uncertainty, while the RMSE only accounts for the mean. Global stationary models provide better estimates of the target distribution in all cases while obtaining comparable RMSEs. It is important to note that the large standard deviations in Table 1 are due to the small number of data points and the fact that some splits in the 10-fold CV end up being harder than others. |
| Hardware Specification | No | We acknowledge the computational resources provided by the Aalto Science-IT project. (This statement is too general and does not specify hardware details like GPU/CPU models.) |
| Software Dependencies | No | The illustrative toy BNN examples are implemented using Turing.jl [14], GP regression results use GPflow [33], and all other experiments are implemented using Py Torch [41]. (No version numbers provided for these software dependencies.) |
| Experiment Setup | Yes | A detailed description of the experimental setup is available in App. D. The illustrative toy BNN examples are implemented using Turing.jl [14], GP regression results use GPflow [33], and all other experiments are implemented using Py Torch [41]. KFAC Laplace was used as the inference method. |