Restricted Strong Convexity of Deep Learning Models with Smooth Activations
Authors: Arindam Banerjee, Pedro Cisneros-Velarde, Libin Zhu, Misha Belkin
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We share preliminary experimental results supporting our theoretical advances. [...] In this section, we present experimental results verifying the RSC condition [...] on standard benchmarks: CIFAR-10, MNIST, and Fashion-MNIST. |
| Researcher Affiliation | Academia | Arindam Banerjee Department of Computer Science University of Illinois at Urbana-Champaign arindamb@illinois.edu; Pedro Cisneros-Velarde Department of Computer Science University of Illinois at Urbana-Champaign pacisne@gmail.com; Libin Zhu Department of Computer Science University of California, San Diego l5zhu@ucsd.edu; Mikhail Belkin Haliciouglu Data Science Institute University of California, San Diego mbelkin@ucsd.edu |
| Pseudocode | No | The paper focuses on theoretical analysis and proofs but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing code or links to a code repository for the methodology described. |
| Open Datasets | Yes | In this section, we present experimental results verifying the RSC condition [...] on standard benchmarks: CIFAR-10, MNIST, and Fashion-MNIST. |
| Dataset Splits | No | The paper mentions '512 randomly chosen training points' and a 'training algorithm' with 'stopping criteria', but it does not specify explicit percentages or counts for training, validation, and test splits, nor does it refer to standard predefined splits for the datasets used beyond just naming them. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, programming languages, or libraries used in the experiments. |
| Experiment Setup | Yes | For the experiments, the network architecture we used had 3-layer fully connected neural network with tanh activation function. The training algorithm is gradient descent (GD) width constant learning rate, chosen appropriately to keep the training in NTK regime. Since we are using GD, we use 512 randomly chosen training points for the experiments. The stopping criteria is either training loss < 10 3 or number of iterations larger than 3000. |