How Powerful are Shallow Neural Networks with Bandlimited Random Weights?

Authors: Ming Li, Sho Sonoda, Feilong Cao, Yu Guang Wang, Jiye Liang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We corroborate our theoretical results with various simulation studies, and generally, two main take-home messages are offered: (i) Not any distribution for selecting random weights is feasible to build a universal approximator; (ii) A suitable assignment of random weights exists but to some degree is associated with the complexity of the target function.In this section, we conduct some simulation studies to verify our theoretical results. Two toy examples for 1D function regression are used in our experiments.
Researcher Affiliation Academia 1Key Laboratory of Intelligent Education Technology and Application of Zhejiang Province, Zhejiang Normal University, Jinhua, China 2Deep Learning Theory Team, RIKEN AIP, Tokyo, Japan 3College of Sciences, China Jiliang University, Hangzhou, China 4Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, China 5School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China 6Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan, China. Correspondence to: Ming Li <mingli@zjnu.edu.cn>, Sho Sonoda <sho.sonoda@riken.jp>.
Pseudocode No This study considers a shallow neural network gd(x) = Pd j=1 cjσ(aj x bj) of input x Rm with activation function σ and parameters (aj, bj, cj) Rm R R for each j [d] := {1, . . . , d}, and the random training method with two steps: Step I: Randomly initialize hidden parameters (aj, bj) to a given data-independent probability distribution Q(a, b), and freeze them; then Step II: Statistically determine output parameters cj given a dataset Dn = {(xi, yi)}n i=1. The paper describes steps in prose but does not include structured pseudocode or an algorithm block.
Open Source Code No The paper does not contain any explicit statement about making source code available or provide any links to a code repository.
Open Datasets Yes Also, we conduct another simulation study on five real-world datasets from KEEL-dataset repository for regression task (https://sci2s.ugr.es/keel/).
Dataset Splits No For each regression task we build random nets with λ taken as an element from the set {0.1, 0.5, 1, 5, 10, 50, 100, 200}, and choose a sufficiently large L (here, L = 10000 in each case) so that we can observe the trend as L + . In a similar way as in Simulation 1, we sample 1000 instances {xi, f(xi)}1000 i=1 which are equally spaced points on [0,1], then randomly and uniformly select 500 training samples and 500 test samples. The paper specifies training and test samples but does not explicitly mention a separate validation split or its details.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions general concepts related to neural networks and mathematical analysis but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, specific solvers) used for the experiments.
Experiment Setup Yes We utilize the following 1D target function in the following Simulation 1 and Simulation 2. f(x; σ)=0.2 exp (x 0.4)2 +0.5 exp (x 0.6)2 where x [0, 1], σ > 0 is a scalar index that can determine the complexity of f, as mentioned in our theoretical analysis. In Simulations 1 and 2, we use the sigmoid activation function.for each regression task we build random nets with λ taken as an element from the set {0.1, 0.5, 1, 5, 10, 50, 100, 200}, and choose a sufficiently large L (here, L = 10000 in each case) so that we can observe the trend as L + .