New Complexity-Theoretic Frontiers of Tractability for Neural Network Training

Authors: Cornelius Brand, Robert Ganian, Mathis Rocton

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this article we obtain novel algorithmic upper bounds for training linearand Re LU-activated neural networks to optimality which push the boundaries of tractability for these problems beyond the previous state of the art. In particular, for Re LU networks we establish the polynomial-time tractability of all architectures where hidden neurons have an out-degree of 1, improving upon the previous algorithm of Arora, Basu, Mianjy and Mukherjee. On the other hand, for networks with linear activation functions we identify the first non-trivial polynomial-time solvable class of networks by obtaining an algorithm that can optimally train network architectures satisfying a novel data throughput condition. ... We have examined the theoretical boundaries of computational tractability for training neural networks both for the fundamental cases of Re LU and linear activations, complementing a flurry of recent results establishing increasingly stronger lower bounds for network training.
Researcher Affiliation Academia Cornelius Brand Algorithms & Complexity Group Vienna University of Technology Favoritenstraße 9-11, 1040 Vienna, Austria cbrand@ac.tuwien.ac.at Robert Ganian Algorithms & Complexity Group Vienna University of Technology Favoritenstraße 9-11, 1040 Vienna, Austria rganian@gmail.com Mathis Rocton Algorithms & Complexity Group Vienna University of Technology Favoritenstraße 9-11, 1040 Vienna, Austria mrocton@ac.tuwien.ac.at
Pseudocode No The paper describes algorithms and proof sketches verbally and through theorems and lemmas (e.g., "The algorithm begins by exhaustively branching..."), but it does not include any formal pseudocode blocks or clearly labeled algorithm sections.
Open Source Code No The paper does not contain any statement about releasing source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets No The paper is theoretical and focuses on computational complexity. It discusses neural network training in a formal context (e.g., "given a training data set containing m samples") but does not describe using or providing access to any specific dataset for empirical evaluation.
Dataset Splits No The paper is theoretical and does not present empirical experiments. Therefore, there are no mentions of specific training, validation, or test dataset splits.
Hardware Specification No The paper is theoretical and does not report on empirical experiments that would require specific hardware. Therefore, no hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not report on empirical experiments or implementations that would require specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not present empirical experiments. Therefore, it does not include details about an experimental setup, hyperparameters, or training configurations.