Random deep neural networks are biased towards simple functions

Authors: Giacomo De Palma, Bobak Kiani, Seth Lloyd

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate all the theoretical results with numerical experiments on deep neural networks with Re LU activation function and two hidden layers. The experiments confirm the scalings Θ( p n/ ln n) and Θ(n) for the Hamming distance of the closest string with a different classification and for the average random flips required to change the classification, respectively.
Researcher Affiliation Academia Giacomo De Palma Mech E & RLE MIT Cambridge MA 02139, USA gdepalma@mit.edu Bobak T. Kiani Mech E & RLE MIT Cambridge MA 02139, USA bkiani@mit.edu Seth Lloyd Mech E, Physics & RLE MIT Cambridge MA 02139, USA slloyd@mit.edu
Pseudocode No No pseudocode or algorithm blocks were found.
Open Source Code No The paper mentions using Keras and TensorFlow but does not provide a link or statement for the open-sourcing of their own specific implementation code.
Open Datasets Yes Moreover, we explore the Hamming distance to the closest bit string with a different classification on deep neural networks trained on the MNIST database [49] of hand-written digits.
Dataset Splits No The paper mentions training on MNIST and evaluating on a test set, but does not provide specific details on train/validation/test splits (e.g., percentages or exact counts for each split, or mention of a validation set).
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running experiments.
Software Dependencies No Simulations were run using the python package Keras with a backend of Tensor Flow [68]. However, specific version numbers for these software components are not provided.
Experiment Setup Yes Weights for all neural networks are initialized according to a normal distribution with zero mean and variance equal to 2/nin, where nin is the number of input units in the weight tensor. No bias term is included in the neural networks. All networks consist of two fully connected hidden layers, each with n neurons (equal to number of input neurons) and activation function set to the commonly used Rectified Linear Unit (Re LU). All networks contain a single output neuron with no activation function. In the notation of section 2, this choice corresponds to σ2 w = 2, σ2 b = 0, n0 = n1 = n2 = n and n3 = 1, and implies F (1) = 1. Simulations were run using the python package Keras with a backend of Tensor Flow [68]. 400 Networks were trained for 20 epochs using the Adam optimizer [67]; average test set accuracy of 98.8% was achieved.