reproducibilityindex.ai

Restricted Strong Convexity of Deep Learning Models with Smooth Activations

Authors: Arindam Banerjee, Pedro Cisneros-Velarde, Libin Zhu, Misha Belkin

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We share preliminary experimental results supporting our theoretical advances. [...] In this section, we present experimental results verifying the RSC condition [...] on standard benchmarks: CIFAR-10, MNIST, and Fashion-MNIST.
Researcher Affiliation	Academia	Arindam Banerjee Department of Computer Science University of Illinois at Urbana-Champaign arindamb@illinois.edu; Pedro Cisneros-Velarde Department of Computer Science University of Illinois at Urbana-Champaign pacisne@gmail.com; Libin Zhu Department of Computer Science University of California, San Diego l5zhu@ucsd.edu; Mikhail Belkin Haliciouglu Data Science Institute University of California, San Diego mbelkin@ucsd.edu
Pseudocode	No	The paper focuses on theoretical analysis and proofs but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing code or links to a code repository for the methodology described.
Open Datasets	Yes	In this section, we present experimental results verifying the RSC condition [...] on standard benchmarks: CIFAR-10, MNIST, and Fashion-MNIST.
Dataset Splits	No	The paper mentions '512 randomly chosen training points' and a 'training algorithm' with 'stopping criteria', but it does not specify explicit percentages or counts for training, validation, and test splits, nor does it refer to standard predefined splits for the datasets used beyond just naming them.
Hardware Specification	No	The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies, programming languages, or libraries used in the experiments.
Experiment Setup	Yes	For the experiments, the network architecture we used had 3-layer fully connected neural network with tanh activation function. The training algorithm is gradient descent (GD) width constant learning rate, chosen appropriately to keep the training in NTK regime. Since we are using GD, we use 512 randomly chosen training points for the experiments. The stopping criteria is either training loss < 10 3 or number of iterations larger than 3000.