reproducibilityindex.ai

Avoiding Kernel Fixed Points: Computing with ELU and GELU Infinite Networks

Authors: Russell Tsuchida, Tim Pearce, Chris van der Heide, Fred Roosta, Marcus Gallagher9967-9977

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Firstly, we derive the covariance functions of multi-layer perceptrons (MLPs) with exponential linear units (ELU) and Gaussian error linear units (GELU) and evaluate the performance of the limiting Gaussian processes on some benchmarks. Secondly, and more generally, we analyse the ﬁxed-point dynamics of iterated kernels corresponding to a broad range of activation functions.
Researcher Affiliation	Academia	Russell Tsuchida,12 Tim Pearce,3 Chris van der Heide,2 Fred Roosta,24 Marcus Gallagher2 1CSIRO, 2The University of Queensland, 3University of Cambridge, 4International Computer Science Institute
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code	Yes	1Software at github.com/RussellTsuchida/ELUGELUkernels
Open Datasets	Yes	We compare the performance of GP regression models using Re LU, LRe LU, ERF and GELU kernels on a popular Bayesian deep learning benchmark (Hern andez-Lobato and Adams 2015).
Dataset Splits	Yes	Figure 6 shows benchmark results for single-hidden-layer GPs using a 90%/10% training/test split.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions software (e.g., 'Neural Tangents' library, Python inferred from context and GitHub link) but does not provide specific version numbers for these software components or their dependencies.
Experiment Setup	Yes	All data was standardised to have mean 0 and variance 1. We varied the depth ℓ [1, 32] in steps of 1 and the weight and bias variances (which were constrained to be equal in each layer) σ2 w [0.1, 5] in steps of 0.1.