Towards Understanding Learning in Neural Networks with Linear Teachers

Authors: Roei Sarussi, Alon Brutzkus, Amir Globerson

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide empirical results that validate our theoretical analysis. We also provide empirical evaluation that confirms that weight clustering indeed explains why approximate linear decision boundaries are learned.
Researcher Affiliation Academia 1The Blavatnik School of Computer Science, Tel Aviv University. Correspondence to: Alon Brutzkus <alonbrutzkus@mail.tau.ac.il>.
Pseudocode No The paper describes optimization algorithms like SGD and gradient flow conceptually and mathematically, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes A network is trained on Gaussian data and binary MNIST problems.
Dataset Splits No The paper does not provide specific dataset split information (e.g., percentages or sample counts) for training, validation, or test sets.
Hardware Specification No The paper does not provide any specific details regarding the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks) used in the experiments.
Experiment Setup Yes The network has 100 neurons, initialized from a Gaussian with standard deviation 0.001 for small initialization and 30 for large initialization. We consider the case where LS(W ) is minimized using SGD in epochs with a batch size of one and a learning rate η.