Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Implicit Bias of Gradient Descent for Non-Homogeneous Deep Networks
Authors: Yuhang Cai, Kangjie Zhou, Jingfeng Wu, Song Mei, Michael Lindsey, Peter Bartlett
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We establish the asymptotic implicit bias of gradient descent (GD) for generic non-homogeneous deep networks under exponential loss. Specifically, we characterize three key properties of GD iterates starting from a sufficiently small empirical risk, where the threshold is determined by a measure of the network s non-homogeneity. First, we show that a normalized margin induced by the GD iterates increases nearly monotonically. Second, we prove that while the norm of the GD iterates diverges to infinity, the iterates themselves converge in direction. Finally, we establish that this directional limit satisfies the Karush Kuhn Tucker (KKT) conditions of a margin maximization problem. Prior works on implicit bias have focused exclusively on homogeneous networks; in contrast, our results apply to a broad class of nonhomogeneous networks satisfying a mild nearhomogeneity condition. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley 2Columbia University 3Lawrence Berkeley National Laboratory 4Google Deep Mind. Correspondence to: Yuhang Cai <EMAIL>, Kangjie Zhou <EMAIL>, Peter L. Bartlett <EMAIL>. |
| Pseudocode | No | The paper focuses on mathematical definitions, theorems, and proofs (e.g., Definition 1, Theorem 3.2, Lemma C.14, etc.) without presenting any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, nor does it provide links to code repositories. The 'Impact Statement' section does not refer to code availability. |
| Open Datasets | No | The paper refers to a 'binary classification dataset, where xi Rd and yi { 1} for all i [n]' and a 'symmetric and linearly separable dataset' as a theoretical example. However, no specific dataset names (e.g., MNIST, CIFAR-10), links, DOIs, or citations to publicly available datasets are provided. |
| Dataset Splits | No | The paper is theoretical and does not mention specific datasets or their splits (e.g., training/test/validation percentages or counts). It refers to 'training data' in a general sense for theoretical analysis. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments, such as GPU models, CPU types, or cloud computing resources. |
| Software Dependencies | No | The paper is theoretical and does not list any specific software dependencies or library versions (e.g., Python, PyTorch, TensorFlow versions) that would be required to reproduce experimental results. |
| Experiment Setup | No | The paper is theoretical and does not contain details about experimental setups, such as hyperparameter values (learning rates, batch sizes, number of epochs), optimizer settings, or model initialization strategies. |