Approximation with CNNs in Sobolev Space: with Applications to Classification

Authors: Guohao Shen, Yuling Jiao, Yuanyuan Lin, Jian Huang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We derive a novel approximation error bound with an explicit prefactor for Sobolevregular functions using deep convolutional neural networks (CNNs). The bound is non-asymptotic in terms of the network depth and filter lengths, in a rather flexible way. For Sobolev-regular functions which can be embedded into the Hölder space, the prefactor of our error bound depends on the ambient dimension polynomially instead of exponentially as in most existing results, which is of independent interest. We also establish a new approximation result when the target function is supported on an approximate lower-dimensional manifold. We apply our results to establish non-asymptotic excess risk bounds for classification using CNNs with convex surrogate losses, including the cross-entropy loss, the hinge loss, the logistic loss, the exponential loss and the least squares loss. We show that the classification methods with CNNs can circumvent the curse of dimensionality if input data is supported on a neighborhood of a low-dimensional manifold.
Researcher Affiliation Academia Guohao Shen Department of Applied Mathematics, The Hong Kong Polytechnic University Hung Hom, Kowloon, Hong Kong SAR, China guohao.shen@polyu.edu.hk Yuling Jiao School of Mathematics and Statistics and Hubei Key Laboratory of Computational Science Wuhan University, Wuhan 430072, China yulingjiaomath@whu.edu.cn Yuanyuan Lin Department of Statistics, The Chinese University of Hong Kong Shatin, New Territories, Hong Kong SAR, China ylin@sta.cuhk.edu.hk Jian Huang Department of Applied Mathematics, The Hong Kong Polytechnic University Hung Hom, Kowloon, Hong Kong SAR, China j.huang@polyu.edu.hk
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It references a detailed description of operations in the appendix but does not provide pseudocode.
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets No The paper is theoretical and does not conduct experiments on a public dataset. It refers to 'input data' and 'data distribution' in a theoretical context but does not provide access information for a specific training dataset.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning for empirical validation, as it is a theoretical paper.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments, as it is a theoretical paper.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment, as it is a theoretical paper.
Experiment Setup No The paper does not contain specific experimental setup details (concrete hyperparameter values, training configurations, or system-level settings) in the main text, as it is a theoretical paper.