On the complexity of nonsmooth automatic differentiation
Authors: Jerome Bolte, Ryan Boustany, Edouard Pauwels, Béatrice Pesquet-Popescu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We provide a simple computational model for addressing the question of complexity theory of nonsmooth numerical programs. For the backward mode, we prove a cheap conservative gradient principle a la Baur-Strassen, generalizing state of the art to nonsmooth programs modeling most NNs. We establish that, regardless of dimension, the computational cost of a conservative gradient is of the order of that of function evaluation. Our results provide a theoretical validation of the fact that the cost of backpropagation does not depend on the programs smoothness. For the forward mode, we relate the computational cost of p directional derivatives to that of p ˆ p matrix multiplication. We provide lower complexity bounds that illustrate the limits to which this deficiency may be improved. We establish that computing two distinct elements in the Clarke subdifferential of a given point is NP-hard for simple Re LU programs. This result also applies to the lexicographic subdifferential. In contrast, we show that the problem can be solved in polynomial time for conservative gradients. This reflects the computational difficulty of dealing with the Clarke subdifferential. A result of independent interest: deciding differentiability of a Re LU program at a point is NP-hard. |
| Researcher Affiliation | Collaboration | J erˆome Bolte1,2, Ryan Boustany1,2,4, Edouard Pauwels 2,3 & B eatrice Pesquet-Popescu 4 1 Toulouse School of Economics 2 Universit e de Toulouse 3 IRIT, CNRS. Institut Universitaire de France (IUF). 4 Thales LAS France {jerome.bolte, ryan.boustany}@ut-capitole.fr, edouard.pauwels@irit.fr, beatrice.pesquet@thalesgroup.com |
| Pseudocode | Yes | Algorithm 1: Program data: p, q, m, pr, pgiqm i p 1 . Input: x px1, . . . xpq 1: for i p 1, p 2, . . . m do 2: xi gipxprpiqq where 3: xprpiq pxjqj Pprpiq. 4: end for Return: y : pxjqm j m q 1. |
| Open Source Code | No | The paper mentions external libraries like TensorFlow, PyTorch, and JAX that are open source, but does not provide an explicit statement or link to the authors' own source code for the methodology described in the paper. |
| Open Datasets | No | This paper is theoretical and does not conduct experiments with datasets. It refers to types of programs and functions (e.g., 'Re LU programs') but does not specify any particular dataset for training or public availability. |
| Dataset Splits | No | As a theoretical paper, no experimental data splits (training, validation, or test) are provided or discussed. |
| Hardware Specification | No | This paper is theoretical and does not detail any experimental setup or computational hardware specifications for running experiments. |
| Software Dependencies | No | The paper refers to existing numerical libraries (TensorFlow, PyTorch, JAX) as context, but does not list specific software dependencies with version numbers required to reproduce any experimental setup, as it is a theoretical work. |
| Experiment Setup | No | This paper focuses on theoretical analysis and does not describe any specific experimental setup, hyperparameters, or system-level training settings. |