Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Wide Neural Networks with Bottlenecks are Deep Gaussian Processes
Authors: Devanshu Agrawal, Theodore Papamarkou, Jacob Hinkle
JMLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the utility of bottleneck NNGPs and their link to no-bottleneck NNGPs empirically, showing that restricting a hidden layer of an NNGP to a bottleneck can boost its model likelihood on three example datasets1. We also characterize the effect of a bottleneck layer theoretically by analyzing an example multi-output single-bottleneck NNGP with rectified linear unit (Re LU) activation. We investigate this question on a simulated dataset that we call Rings and on two publicly available datasets Fisher s Iris data set (Anderson, 1935; Fisher, 1936) and the US Census Boston housing prices dataset (Harrison and Rubinfeld, 1978). |
| Researcher Affiliation | Academia | Devanshu Agrawal EMAIL The Bredesen Center University of Tennessee Knoxville, TN 37996-3394, USA Theodore Papamarkou EMAIL Computational Sciences and Engineering Division Oak Ridge National Lab Oak Ridge, TN 37830-8050, USA Jacob Hinkle EMAIL Computational Sciences and Engineering Division Oak Ridge National Lab Oak Ridge, TN 37830-8050, USA |
| Pseudocode | No | The paper primarily presents mathematical derivations and proofs, along with experimental results and theoretical analysis. It does not include any explicitly labeled pseudocode or algorithm blocks describing a method or procedure in a structured format. |
| Open Source Code | Yes | 1. Code for our simulations and experiments is available at https://code.ornl.gov/d0a/bottleneck_nngp. |
| Open Datasets | Yes | We investigate this question on a simulated dataset that we call Rings and on two publicly available datasets Fisher s Iris data set (Anderson, 1935; Fisher, 1936) and the US Census Boston housing prices dataset (Harrison and Rubinfeld, 1978). |
| Dataset Splits | No | The paper describes the creation and labeling of the Rings dataset and mentions using the Iris and Boston House-Prices datasets, but it does not specify any training, validation, or test split percentages or sample counts for any of these datasets. For example, it does not state '80/10/10 split' or '40,000 training samples'. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as exact GPU or CPU models, processor types, or memory specifications. Generic terms like 'on a GPU' are not present either. |
| Software Dependencies | No | The paper mentions using the Adam optimizer and normalized ReLU activation, but it does not specify the versions of any programming languages, libraries, or frameworks (e.g., Python version, PyTorch version, TensorFlow version) that would be needed to reproduce the experiments. |
| Experiment Setup | Yes | We found the optimal variance hyperparameters iteratively through gradient descent. During the forward pass through the network in each iteration, we estimated the integral in Eq. (41) by drawing 100 IID Monte Carlo (MC) samples... We used the Adam optimizer (Kingma and Ba, 2014) to take advantage of the gradient noise generated by MC sampling during optimization; we set the initial learning rate to 0.1. ...if the new MLL was less than the value obtained from the initial forward pass of the iteration, then we multiplied the learning rate by 0.9. ...once complete, we evaluated Eq. (10) once more this time with 1000 MC samples to obtain the final MLL estimate for each network architecture. |