Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Wide Neural Networks with Bottlenecks are Deep Gaussian Processes

Authors: Devanshu Agrawal, Theodore Papamarkou, Jacob Hinkle

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the utility of bottleneck NNGPs and their link to no-bottleneck NNGPs empirically, showing that restricting a hidden layer of an NNGP to a bottleneck can boost its model likelihood on three example datasets1. We also characterize the effect of a bottleneck layer theoretically by analyzing an example multi-output single-bottleneck NNGP with rectified linear unit (Re LU) activation. We investigate this question on a simulated dataset that we call Rings and on two publicly available datasets Fisher s Iris data set (Anderson, 1935; Fisher, 1936) and the US Census Boston housing prices dataset (Harrison and Rubinfeld, 1978).
Researcher Affiliation Academia Devanshu Agrawal EMAIL The Bredesen Center University of Tennessee Knoxville, TN 37996-3394, USA Theodore Papamarkou EMAIL Computational Sciences and Engineering Division Oak Ridge National Lab Oak Ridge, TN 37830-8050, USA Jacob Hinkle EMAIL Computational Sciences and Engineering Division Oak Ridge National Lab Oak Ridge, TN 37830-8050, USA
Pseudocode No The paper primarily presents mathematical derivations and proofs, along with experimental results and theoretical analysis. It does not include any explicitly labeled pseudocode or algorithm blocks describing a method or procedure in a structured format.
Open Source Code Yes 1. Code for our simulations and experiments is available at https://code.ornl.gov/d0a/bottleneck_nngp.
Open Datasets Yes We investigate this question on a simulated dataset that we call Rings and on two publicly available datasets Fisher s Iris data set (Anderson, 1935; Fisher, 1936) and the US Census Boston housing prices dataset (Harrison and Rubinfeld, 1978).
Dataset Splits No The paper describes the creation and labeling of the Rings dataset and mentions using the Iris and Boston House-Prices datasets, but it does not specify any training, validation, or test split percentages or sample counts for any of these datasets. For example, it does not state '80/10/10 split' or '40,000 training samples'.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as exact GPU or CPU models, processor types, or memory specifications. Generic terms like 'on a GPU' are not present either.
Software Dependencies No The paper mentions using the Adam optimizer and normalized ReLU activation, but it does not specify the versions of any programming languages, libraries, or frameworks (e.g., Python version, PyTorch version, TensorFlow version) that would be needed to reproduce the experiments.
Experiment Setup Yes We found the optimal variance hyperparameters iteratively through gradient descent. During the forward pass through the network in each iteration, we estimated the integral in Eq. (41) by drawing 100 IID Monte Carlo (MC) samples... We used the Adam optimizer (Kingma and Ba, 2014) to take advantage of the gradient noise generated by MC sampling during optimization; we set the initial learning rate to 0.1. ...if the new MLL was less than the value obtained from the initial forward pass of the iteration, then we multiplied the learning rate by 0.9. ...once complete, we evaluated Eq. (10) once more this time with 1000 MC samples to obtain the final MLL estimate for each network architecture.