The Expressive Power of Neural Networks: A View from the Width

Authors: Zhou Lu, Hongming Pu, Feicheng Wang, Zhiqiang Hu, Liwei Wang

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further conduct extensive experiments to provide some insights about the upper bound of such an approximation. To this end, we study a series of network architectures with varied width. For each network architecture, we randomly sample the parameters... The approximation error is empirically calculated... Table 1 lists the results.
Researcher Affiliation Academia Zhou Lu1,3 1400010739@pku.edu.cn Hongming Pu1 1400010621@pku.edu.cn Feicheng Wang1,3 1400010604@pku.edu.cn Zhiqiang Hu2 huzq@pku.edu.cn Liwei Wang2,3 wanglw@cis.pku.edu.cn 1, Department of Mathematics, Peking University 2, Key Laboratory of Machine Perception, MOE, School of EECS, Peking University 3, Center for Data Science, Peking University, Beijing Institute of Big Data Research
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It refers to a figure and describes a construction informally, but no formal algorithm steps.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. There are no explicit statements about releasing code, nor are there any repository links.
Open Datasets No The paper describes generating its own 'uniformly placed inputs' and sampling parameters, but it does not provide concrete access information (link, DOI, repository name, or formal citation with authors/year) for a publicly available or open dataset.
Dataset Splits No The paper states, 'half of all the test inputs from [ 1, 1)n and the corresponding values evaluated by target function constitute the training set.' This describes a training split, but it does not mention a distinct validation set or specify a comprehensive train/validation/test split.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions 'mini-batch Ada Delta optimizer' but does not provide specific version numbers for any software components, which is required for reproducibility.
Experiment Setup Yes The training set is used to train approximator network with a mini-batch Ada Delta optimizer and learning rate 1.0. The parameters of approximator network are randomly initialized according to [8]. The training process proceeds 100 epoches for n = 1 and 200 epoches for n = 2; the best approximator function is recorded.