Characterizing Implicit Bias in Terms of Optimization Geometry

Authors: Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The empirical results for this problem in Figure 1c clearly show that even for ℓp norms where the . 2 p is smooth and strongly convex, the corresponding steepest descent converges to a global minimum that depends on the step size. Figure 1: Dependence of implicit bias on step size and momentum: In (a) (c)... (a) Mirror descent with primal momentum (Example 2): the global minimum that eq. (8) converges to depends on the momentum parameters the sub-plots contain the trajectories of eq. (8) for different choices of βt = β and γt = γ; (b) Natural gradient descent (Example 3): for different step sizes ηt = η, eq. (9) converges to different global minima. Here, η was chosen to be small enough to ensure w(t) dom(ψ).
Researcher Affiliation Academia 1 TTI Chicago, USA 2 USC Los Angeles, USA 3 Technion, Israel. Correspondence to: Suriya Gunasekar <suriya@ttic.edu>, Jason Lee <jasonlee@marshall.usc.edu>, Daniel Soudry <daniel.soudry@gmail.com>, Nathan Srebro <nati@ttic.edu>.
Pseudocode No The paper describes algorithms using mathematical equations (e.g., eq. 3, 4, 5, 9, 11, 13) but does not include any blocks explicitly labeled as "Pseudocode" or "Algorithm".
Open Source Code No The paper does not provide any explicit statement about releasing source code for the methodology or a link to a code repository.
Open Datasets No The paper uses simple, illustrative datasets for its examples, such as "dataset {(x1 = [1, 2], y1 = 1)}" and "dataset {(x1 = [1, 1, 1], y1 = 1), (x1 = [1, 2, 0], y1 = 10)}", but does not provide concrete access information (link, DOI, formal citation) for any publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, or testing. The examples use small, custom-defined data points for theoretical demonstration.
Hardware Specification No The paper does not explicitly describe the hardware (e.g., specific GPU/CPU models, memory) used to run its examples or experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the demonstrations or experiments.
Experiment Setup Yes Figure 1: Dependence of implicit bias on step size and momentum: In (a) (c), the blue line denotes the set G of global minima for the respective examples... (a) Mirror descent with primal momentum (Example 2): the global minimum that eq. (8) converges to depends on the momentum parameters the sub-plots contain the trajectories of eq. (8) for different choices of βt = β and γt = γ; (b) Natural gradient descent (Example 3): for different step sizes ηt = η, eq. (9) converges to different global minima. Here, η was chosen to be small enough to ensure w(t) dom(ψ). (c) Steepest descent w.r.t . 4/3 (Example 4): the global minimum to which eq. (11) converges depends on η. Here w(0) = [0, 0, 0]...