Avoiding Kernel Fixed Points: Computing with ELU and GELU Infinite Networks
Authors: Russell Tsuchida, Tim Pearce, Chris van der Heide, Fred Roosta, Marcus Gallagher9967-9977
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Firstly, we derive the covariance functions of multi-layer perceptrons (MLPs) with exponential linear units (ELU) and Gaussian error linear units (GELU) and evaluate the performance of the limiting Gaussian processes on some benchmarks. Secondly, and more generally, we analyse the fixed-point dynamics of iterated kernels corresponding to a broad range of activation functions. |
| Researcher Affiliation | Academia | Russell Tsuchida,12 Tim Pearce,3 Chris van der Heide,2 Fred Roosta,24 Marcus Gallagher2 1CSIRO, 2The University of Queensland, 3University of Cambridge, 4International Computer Science Institute |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | Yes | 1Software at github.com/RussellTsuchida/ELUGELUkernels |
| Open Datasets | Yes | We compare the performance of GP regression models using Re LU, LRe LU, ERF and GELU kernels on a popular Bayesian deep learning benchmark (Hern andez-Lobato and Adams 2015). |
| Dataset Splits | Yes | Figure 6 shows benchmark results for single-hidden-layer GPs using a 90%/10% training/test split. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software (e.g., 'Neural Tangents' library, Python inferred from context and GitHub link) but does not provide specific version numbers for these software components or their dependencies. |
| Experiment Setup | Yes | All data was standardised to have mean 0 and variance 1. We varied the depth ℓ [1, 32] in steps of 1 and the weight and bias variances (which were constrained to be equal in each layer) σ2 w [0.1, 5] in steps of 0.1. |