Avoiding Kernel Fixed Points: Computing with ELU and GELU Infinite Networks

Authors: Russell Tsuchida, Tim Pearce, Chris van der Heide, Fred Roosta, Marcus Gallagher9967-9977

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Firstly, we derive the covariance functions of multi-layer perceptrons (MLPs) with exponential linear units (ELU) and Gaussian error linear units (GELU) and evaluate the performance of the limiting Gaussian processes on some benchmarks. Secondly, and more generally, we analyse the fixed-point dynamics of iterated kernels corresponding to a broad range of activation functions.
Researcher Affiliation Academia Russell Tsuchida,12 Tim Pearce,3 Chris van der Heide,2 Fred Roosta,24 Marcus Gallagher2 1CSIRO, 2The University of Queensland, 3University of Cambridge, 4International Computer Science Institute
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code Yes 1Software at github.com/RussellTsuchida/ELUGELUkernels
Open Datasets Yes We compare the performance of GP regression models using Re LU, LRe LU, ERF and GELU kernels on a popular Bayesian deep learning benchmark (Hern andez-Lobato and Adams 2015).
Dataset Splits Yes Figure 6 shows benchmark results for single-hidden-layer GPs using a 90%/10% training/test split.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions software (e.g., 'Neural Tangents' library, Python inferred from context and GitHub link) but does not provide specific version numbers for these software components or their dependencies.
Experiment Setup Yes All data was standardised to have mean 0 and variance 1. We varied the depth ℓ [1, 32] in steps of 1 and the weight and bias variances (which were constrained to be equal in each layer) σ2 w [0.1, 5] in steps of 0.1.