Parallel Deep Neural Networks Have Zero Duality Gap

Authors: Yifei Wang, Tolga Ergen, Mert Pilanci

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we prove that the duality gap for deeper linear networks with vector outputs is non-zero. In contrast, we show that the zero duality gap can be obtained by stacking standard deep networks in parallel, which we call a parallel architecture, and modifying the regularization. Therefore, we prove the strong duality and existence of equivalent convex problems that enable globally optimal training of deep networks. As a by-product of our analysis, we demonstrate that the weight decay regularization on the network parameters explicitly encourages low-rank solutions via closed-form expressions. In addition, we show that strong duality holds for three-layer standard Re LU networks given rank-1 data matrices.
Researcher Affiliation Academia Yifei Wang, Tolga Ergen & Mert Pilanci Department of Electrical Engineering Stanford University {wangyf18,ergen,pilanci}@stanford.edu
Pseudocode No The paper provides mathematical derivations and proofs but no pseudocode or algorithm blocks.
Open Source Code No The paper is theoretical and does not mention releasing any source code for its methods.
Open Datasets No The paper is theoretical and does not use or reference specific datasets for training or evaluation.
Dataset Splits No The paper is theoretical and does not describe any dataset splits for validation.
Hardware Specification No The paper is theoretical and does not describe any hardware used for experiments.
Software Dependencies No The paper is theoretical and does not list any software dependencies with version numbers for experimental reproducibility.
Experiment Setup No The paper is theoretical and does not provide details about an experimental setup, such as hyperparameters or training settings.