Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced
Authors: Simon S. Du, Wei Hu, Jason D. Lee
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 4, we empirically verify the theoretical result in Section 2. We perform experiments to verify the auto-balancing properties of gradient descent in neural networks with Re LU activation. |
| Researcher Affiliation | Academia | Simon S. Du: Machine Learning Department, School of Computer Science, Carnegie Mellon University. Email: ssdu@cs.cmu.edu; Wei Hu: Computer Science Department, Princeton University. Email: huwei@cs.princeton.edu; Jason D. Lee: Department of Data Sciences and Operations, Marshall School of Business, University of Southern California. Email: jasonlee@marshall.usc.edu |
| Pseudocode | No | The paper describes algorithms but does not include any labeled 'Pseudocode' or 'Algorithm' blocks, nor are the steps formatted in a structured, code-like manner. |
| Open Source Code | No | The paper does not provide any specific statement about releasing source code, nor does it include a link to a code repository for the described methodology. |
| Open Datasets | No | The paper mentions 'Given a training dataset tpxi, yiqum i 1 Ä Rd ˆ Rp' and 'We use 1,000 data points', but it does not specify the name of a publicly available dataset, nor does it provide any link, DOI, or formal citation for accessing it. |
| Dataset Splits | No | The paper refers to a 'training dataset' and '1,000 data points' but does not provide any specific information about training, validation, or test dataset splits (e.g., percentages, sample counts, or references to standard splits). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers (e.g., programming languages, libraries, or frameworks with their versions) used to replicate the experiments. |
| Experiment Setup | Yes | We consider a 3-layer fully connected network of the form fpxq W3φp W2φp W1xqq where x P R1,000 is the input, W1 P R100ˆ1,000, W2 P R100ˆ100, W3 P R10ˆ100, and φp q is Re LU activation. We use 1,000 data points and the quadratic loss function, and run GD. We first test a balanced initialization: W1ri, js Np0, 10 4 100 q, W2ri, js Np0, 10 4 10 q and W3ri, js Np0, 10 4q. We then test an unbalanced initialization: W1ri, js Np0, 10 4q, W2ri, js Np0, 10 4q and W3ri, js Np0, 10 4q. After 10,000 iterations we have... and step sizes t 100pt 1q}M }3{2 F. |