Fast Convergence in Learning Two-Layer Neural Networks with Separable Data

Authors: Hossein Taheri, Christos Thrampoulidis

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical Experiments In this section, we demonstrate the empirical performance of normalized GD. Figure 1 illustrates the training loss (Left), the test error % (middle), and the weight norm (Right) of GD with normalized GD. The experiments are conducted on a two-layer neural network with m = 50 hidden neurons with leaky-Re LU activation function in (6) where α = 0.2 and = 1. The second layer weights are chosen randomly from aj { 1 m} and kept ixed during training and test time. The irst layer weights are initialized from standard Gaussian distribution and then normalized to unit norm. We consider binary classi ication with exponential loss using digits 0 and 1 from the MNIST dataset (d = 784) and we set the sample size to n = 1000.
Researcher Affiliation Academia 1University of California, Santa Barbara 2University of British Columbia hossein@ucsb.edu, cthrampo@ece.ubc.ca
Pseudocode No The paper provides mathematical equations for the Normalized GD update rule (e.g., 'wt+1 = wt t F(wt). (2)'). However, it does not present this or any other part of the methodology in a structured pseudocode block or algorithm listing.
Open Source Code No The paper does not contain any explicit statements about releasing source code for the methodology described, nor does it provide any links to a code repository.
Open Datasets Yes We consider binary classi ication with exponential loss using digits 0 and 1 from the MNIST dataset (d = 784) and we set the sample size to n = 1000.
Dataset Splits No The paper mentions the use of the MNIST dataset and synthetic datasets, but it does not specify the exact training, validation, or test split percentages or sample counts. While test error is reported, the method of splitting the data (e.g., '80/10/10 split' or specific numbers of samples for each set) is not detailed. The term 'validation' is not used in the context of dataset splits.
Hardware Specification No The paper describes the setup for numerical experiments, including the neural network architecture and datasets, but it does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to conduct these experiments.
Software Dependencies No The paper does not provide specific version numbers for any software components, libraries, or frameworks used in their experiments (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The experiments are conducted on a two-layer neural network with m = 50 hidden neurons with leaky-Re LU activation function in (6) where α = 0.2 and = 1. The second layer weights are chosen randomly from aj { 1 m} and kept ixed during training and test time. The irst layer weights are initialized from standard Gaussian distribution and then normalized to unit norm. We consider binary classi ication with exponential loss using digits 0 and 1 from the MNIST dataset (d = 784) and we set the sample size to n = 1000. The step-size are ine-tuned to = 30 and 5 for GD and normalized GD, respectively so that each line represents the best of each algorithm.