Non-Euclidean Universal Approximation

Authors: Anastasis Kratsios, Ievgen Bilokopytov

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To illustrate the effect of (in)correctly choosing the networks input and output layers we implement different DNNs whose initial or final layers are build using the above examples or intentionally fail Assumptions 3.1 or 3.2. Our implementations are on the California housing dataset [31], with the objective of predicting the median housing value given a set of economic and geo-spacial factors as described in [18]. The test-set consists of 30% percent of the total 20k training instances. The implemented networks are of the form ρ f φ, where f = W2 σ W1 is a shallow feed-forward network with Re LU activation and ρ,φ are built using the above examples. Our reference model (Vanilla) is simply the shallow feed-forward network f. For the first DNN, which we denote (Bad), ρ and φ are given by as in Example 3.24 and therefore violate Assumption 3.1. For the second DNN, denoted by (Good), ρ and φ are as in Example (3.23) and Assumptions 3.1 and 3.2 are met. The final DNN, denoted by (Rand), ρ is as in Example 3.23 and φ is as in Example 3.26 where the pre-trained network is generated randomly following in Corollary 3.20. Model Good Rand Bad Vanilla Good Rand Bad Vanilla MAE 0.318 0.320 0.876 0.320 0.252 0.296 0.887 0.284 MSE 0.247 0.259 1.355 0.257 0.174 0.234 1.409 0.209 MAPE 16.714 17.626 48.051 17.427 12.921 15.668 48.698 14.878 Table 1: Training and test predictive performance. As anticipated, Table 1 shows that selecting the networks initial and final layers according to our method either improves performance (Good) when all involved parameters are trainable or does not significantly affect it even if nearly every parameter is random (Rand). However, disregarding Assumptions 3.1 and 3.2 when adding additional deep layers dramatically degrades predictive performance, as is the case for (Bad). Table 1 shows that if a DNN s first and final layers are structured according to Theorem 3.10 then expressibility can be improved, even if these layers violate the minimum width bounds of [29, 45]. Python code for these implementations is available at [33].
Researcher Affiliation Academia Department of Mathematics, Eidgenössische Technische Hochschule Zürich, HG G 32.3, Rämistrasse 101, 8092 ürich, Switzerland. email: anastasis.kratsios@math.ethz.ch Department of Mathematics and Statistical Sciences, University of Alberta, 11324 89 Ave NW, Edmonton, AB T6G 2J5, Canada. email: bilokopy@ualberta.ca
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Python code for these implementations is available at [33].
Open Datasets Yes Our implementations are on the California housing dataset [31], with the objective of predicting the median housing value given a set of economic and geo-spacial factors as described in [18]. [31] is listed as: Kaggle. California housing prices. https://www.kaggle.com/camnugent/california-housing-prices. Accessed: 2020-05-15.
Dataset Splits No The test-set consists of 30% percent of the total 20k training instances. This specifies the test split and total instances, but does not explicitly mention or quantify a separate validation split.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'Python code' but does not specify any software dependencies with version numbers.
Experiment Setup No The paper describes the general form of the implemented networks (ρ f φ, where f = W2 σ W1 is a shallow feed-forward network with Re LU activation) and the dataset size, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) or detailed training configurations.