Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

Authors: Simon S. Du, Wei Hu, Jason D. Lee

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 4, we empirically verify the theoretical result in Section 2. We perform experiments to verify the auto-balancing properties of gradient descent in neural networks with Re LU activation.
Researcher Affiliation Academia Simon S. Du: Machine Learning Department, School of Computer Science, Carnegie Mellon University. Email: ssdu@cs.cmu.edu; Wei Hu: Computer Science Department, Princeton University. Email: huwei@cs.princeton.edu; Jason D. Lee: Department of Data Sciences and Operations, Marshall School of Business, University of Southern California. Email: jasonlee@marshall.usc.edu
Pseudocode No The paper describes algorithms but does not include any labeled 'Pseudocode' or 'Algorithm' blocks, nor are the steps formatted in a structured, code-like manner.
Open Source Code No The paper does not provide any specific statement about releasing source code, nor does it include a link to a code repository for the described methodology.
Open Datasets No The paper mentions 'Given a training dataset tpxi, yiqum i 1 Ä Rd ˆ Rp' and 'We use 1,000 data points', but it does not specify the name of a publicly available dataset, nor does it provide any link, DOI, or formal citation for accessing it.
Dataset Splits No The paper refers to a 'training dataset' and '1,000 data points' but does not provide any specific information about training, validation, or test dataset splits (e.g., percentages, sample counts, or references to standard splits).
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper does not provide specific software dependencies or version numbers (e.g., programming languages, libraries, or frameworks with their versions) used to replicate the experiments.
Experiment Setup Yes We consider a 3-layer fully connected network of the form fpxq W3φp W2φp W1xqq where x P R1,000 is the input, W1 P R100ˆ1,000, W2 P R100ˆ100, W3 P R10ˆ100, and φp q is Re LU activation. We use 1,000 data points and the quadratic loss function, and run GD. We first test a balanced initialization: W1ri, js Np0, 10 4 100 q, W2ri, js Np0, 10 4 10 q and W3ri, js Np0, 10 4q. We then test an unbalanced initialization: W1ri, js Np0, 10 4q, W2ri, js Np0, 10 4q and W3ri, js Np0, 10 4q. After 10,000 iterations we have... and step sizes t 100pt 1q}M }3{2 F.