Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Implicit Bias of Gradient Descent for Non-Homogeneous Deep Networks

Authors: Yuhang Cai, Kangjie Zhou, Jingfeng Wu, Song Mei, Michael Lindsey, Peter Bartlett

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We establish the asymptotic implicit bias of gradient descent (GD) for generic non-homogeneous deep networks under exponential loss. Specifically, we characterize three key properties of GD iterates starting from a sufficiently small empirical risk, where the threshold is determined by a measure of the network s non-homogeneity. First, we show that a normalized margin induced by the GD iterates increases nearly monotonically. Second, we prove that while the norm of the GD iterates diverges to infinity, the iterates themselves converge in direction. Finally, we establish that this directional limit satisfies the Karush Kuhn Tucker (KKT) conditions of a margin maximization problem. Prior works on implicit bias have focused exclusively on homogeneous networks; in contrast, our results apply to a broad class of nonhomogeneous networks satisfying a mild nearhomogeneity condition.
Researcher Affiliation Collaboration 1University of California, Berkeley 2Columbia University 3Lawrence Berkeley National Laboratory 4Google Deep Mind. Correspondence to: Yuhang Cai <EMAIL>, Kangjie Zhou <EMAIL>, Peter L. Bartlett <EMAIL>.
Pseudocode No The paper focuses on mathematical definitions, theorems, and proofs (e.g., Definition 1, Theorem 3.2, Lemma C.14, etc.) without presenting any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide links to code repositories. The 'Impact Statement' section does not refer to code availability.
Open Datasets No The paper refers to a 'binary classification dataset, where xi Rd and yi { 1} for all i [n]' and a 'symmetric and linearly separable dataset' as a theoretical example. However, no specific dataset names (e.g., MNIST, CIFAR-10), links, DOIs, or citations to publicly available datasets are provided.
Dataset Splits No The paper is theoretical and does not mention specific datasets or their splits (e.g., training/test/validation percentages or counts). It refers to 'training data' in a general sense for theoretical analysis.
Hardware Specification No The paper is theoretical and does not describe any specific hardware used for experiments, such as GPU models, CPU types, or cloud computing resources.
Software Dependencies No The paper is theoretical and does not list any specific software dependencies or library versions (e.g., Python, PyTorch, TensorFlow versions) that would be required to reproduce experimental results.
Experiment Setup No The paper is theoretical and does not contain details about experimental setups, such as hyperparameter values (learning rates, batch sizes, number of epochs), optimizer settings, or model initialization strategies.