reproducibilityindex.ai

Non-Euclidean Universal Approximation

Authors: Anastasis Kratsios, Ievgen Bilokopytov

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To illustrate the effect of (in)correctly choosing the networks input and output layers we implement different DNNs whose initial or ﬁnal layers are build using the above examples or intentionally fail Assumptions 3.1 or 3.2. Our implementations are on the California housing dataset [31], with the objective of predicting the median housing value given a set of economic and geo-spacial factors as described in [18]. The test-set consists of 30% percent of the total 20k training instances. The implemented networks are of the form ρ f φ, where f = W2 σ W1 is a shallow feed-forward network with Re LU activation and ρ,φ are built using the above examples. Our reference model (Vanilla) is simply the shallow feed-forward network f. For the ﬁrst DNN, which we denote (Bad), ρ and φ are given by as in Example 3.24 and therefore violate Assumption 3.1. For the second DNN, denoted by (Good), ρ and φ are as in Example (3.23) and Assumptions 3.1 and 3.2 are met. The ﬁnal DNN, denoted by (Rand), ρ is as in Example 3.23 and φ is as in Example 3.26 where the pre-trained network is generated randomly following in Corollary 3.20. Model Good Rand Bad Vanilla Good Rand Bad Vanilla MAE 0.318 0.320 0.876 0.320 0.252 0.296 0.887 0.284 MSE 0.247 0.259 1.355 0.257 0.174 0.234 1.409 0.209 MAPE 16.714 17.626 48.051 17.427 12.921 15.668 48.698 14.878 Table 1: Training and test predictive performance. As anticipated, Table 1 shows that selecting the networks initial and ﬁnal layers according to our method either improves performance (Good) when all involved parameters are trainable or does not signiﬁcantly affect it even if nearly every parameter is random (Rand). However, disregarding Assumptions 3.1 and 3.2 when adding additional deep layers dramatically degrades predictive performance, as is the case for (Bad). Table 1 shows that if a DNN s ﬁrst and ﬁnal layers are structured according to Theorem 3.10 then expressibility can be improved, even if these layers violate the minimum width bounds of [29, 45]. Python code for these implementations is available at [33].
Researcher Affiliation	Academia	Department of Mathematics, Eidgenössische Technische Hochschule Zürich, HG G 32.3, Rämistrasse 101, 8092 ürich, Switzerland. email: anastasis.kratsios@math.ethz.ch Department of Mathematics and Statistical Sciences, University of Alberta, 11324 89 Ave NW, Edmonton, AB T6G 2J5, Canada. email: bilokopy@ualberta.ca
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Python code for these implementations is available at [33].
Open Datasets	Yes	Our implementations are on the California housing dataset [31], with the objective of predicting the median housing value given a set of economic and geo-spacial factors as described in [18]. [31] is listed as: Kaggle. California housing prices. https://www.kaggle.com/camnugent/california-housing-prices. Accessed: 2020-05-15.
Dataset Splits	No	The test-set consists of 30% percent of the total 20k training instances. This specifies the test split and total instances, but does not explicitly mention or quantify a separate validation split.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions 'Python code' but does not specify any software dependencies with version numbers.
Experiment Setup	No	The paper describes the general form of the implemented networks (ρ f φ, where f = W2 σ W1 is a shallow feed-forward network with Re LU activation) and the dataset size, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) or detailed training configurations.