On the complexity of nonsmooth automatic differentiation

Authors: Jerome Bolte, Ryan Boustany, Edouard Pauwels, Béatrice Pesquet-Popescu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We provide a simple computational model for addressing the question of complexity theory of nonsmooth numerical programs. For the backward mode, we prove a cheap conservative gradient principle a la Baur-Strassen, generalizing state of the art to nonsmooth programs modeling most NNs. We establish that, regardless of dimension, the computational cost of a conservative gradient is of the order of that of function evaluation. Our results provide a theoretical validation of the fact that the cost of backpropagation does not depend on the programs smoothness. For the forward mode, we relate the computational cost of p directional derivatives to that of p ˆ p matrix multiplication. We provide lower complexity bounds that illustrate the limits to which this deficiency may be improved. We establish that computing two distinct elements in the Clarke subdifferential of a given point is NP-hard for simple Re LU programs. This result also applies to the lexicographic subdifferential. In contrast, we show that the problem can be solved in polynomial time for conservative gradients. This reflects the computational difficulty of dealing with the Clarke subdifferential. A result of independent interest: deciding differentiability of a Re LU program at a point is NP-hard.
Researcher Affiliation Collaboration J erˆome Bolte1,2, Ryan Boustany1,2,4, Edouard Pauwels 2,3 & B eatrice Pesquet-Popescu 4 1 Toulouse School of Economics 2 Universit e de Toulouse 3 IRIT, CNRS. Institut Universitaire de France (IUF). 4 Thales LAS France {jerome.bolte, ryan.boustany}@ut-capitole.fr, edouard.pauwels@irit.fr, beatrice.pesquet@thalesgroup.com
Pseudocode Yes Algorithm 1: Program data: p, q, m, pr, pgiqm i p 1 . Input: x px1, . . . xpq 1: for i p 1, p 2, . . . m do 2: xi gipxprpiqq where 3: xprpiq pxjqj Pprpiq. 4: end for Return: y : pxjqm j m q 1.
Open Source Code No The paper mentions external libraries like TensorFlow, PyTorch, and JAX that are open source, but does not provide an explicit statement or link to the authors' own source code for the methodology described in the paper.
Open Datasets No This paper is theoretical and does not conduct experiments with datasets. It refers to types of programs and functions (e.g., 'Re LU programs') but does not specify any particular dataset for training or public availability.
Dataset Splits No As a theoretical paper, no experimental data splits (training, validation, or test) are provided or discussed.
Hardware Specification No This paper is theoretical and does not detail any experimental setup or computational hardware specifications for running experiments.
Software Dependencies No The paper refers to existing numerical libraries (TensorFlow, PyTorch, JAX) as context, but does not list specific software dependencies with version numbers required to reproduce any experimental setup, as it is a theoretical work.
Experiment Setup No This paper focuses on theoretical analysis and does not describe any specific experimental setup, hyperparameters, or system-level training settings.