Restricted Strong Convexity of Deep Learning Models with Smooth Activations

Authors: Arindam Banerjee, Pedro Cisneros-Velarde, Libin Zhu, Misha Belkin

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We share preliminary experimental results supporting our theoretical advances. [...] In this section, we present experimental results verifying the RSC condition [...] on standard benchmarks: CIFAR-10, MNIST, and Fashion-MNIST.
Researcher Affiliation Academia Arindam Banerjee Department of Computer Science University of Illinois at Urbana-Champaign arindamb@illinois.edu; Pedro Cisneros-Velarde Department of Computer Science University of Illinois at Urbana-Champaign pacisne@gmail.com; Libin Zhu Department of Computer Science University of California, San Diego l5zhu@ucsd.edu; Mikhail Belkin Haliciouglu Data Science Institute University of California, San Diego mbelkin@ucsd.edu
Pseudocode No The paper focuses on theoretical analysis and proofs but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing code or links to a code repository for the methodology described.
Open Datasets Yes In this section, we present experimental results verifying the RSC condition [...] on standard benchmarks: CIFAR-10, MNIST, and Fashion-MNIST.
Dataset Splits No The paper mentions '512 randomly chosen training points' and a 'training algorithm' with 'stopping criteria', but it does not specify explicit percentages or counts for training, validation, and test splits, nor does it refer to standard predefined splits for the datasets used beyond just naming them.
Hardware Specification No The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, programming languages, or libraries used in the experiments.
Experiment Setup Yes For the experiments, the network architecture we used had 3-layer fully connected neural network with tanh activation function. The training algorithm is gradient descent (GD) width constant learning rate, chosen appropriately to keep the training in NTK regime. Since we are using GD, we use 512 randomly chosen training points for the experiments. The stopping criteria is either training loss < 10 3 or number of iterations larger than 3000.