Characterizing Implicit Bias in Terms of Optimization Geometry
Authors: Suriya Gunasekar, Jason Lee, Daniel Soudry, Nathan Srebro
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirical results for this problem in Figure 1c clearly show that even for ℓp norms where the . 2 p is smooth and strongly convex, the corresponding steepest descent converges to a global minimum that depends on the step size. Figure 1: Dependence of implicit bias on step size and momentum: In (a) (c)... (a) Mirror descent with primal momentum (Example 2): the global minimum that eq. (8) converges to depends on the momentum parameters the sub-plots contain the trajectories of eq. (8) for different choices of βt = β and γt = γ; (b) Natural gradient descent (Example 3): for different step sizes ηt = η, eq. (9) converges to different global minima. Here, η was chosen to be small enough to ensure w(t) dom(ψ). |
| Researcher Affiliation | Academia | 1 TTI Chicago, USA 2 USC Los Angeles, USA 3 Technion, Israel. Correspondence to: Suriya Gunasekar <suriya@ttic.edu>, Jason Lee <jasonlee@marshall.usc.edu>, Daniel Soudry <daniel.soudry@gmail.com>, Nathan Srebro <nati@ttic.edu>. |
| Pseudocode | No | The paper describes algorithms using mathematical equations (e.g., eq. 3, 4, 5, 9, 11, 13) but does not include any blocks explicitly labeled as "Pseudocode" or "Algorithm". |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code for the methodology or a link to a code repository. |
| Open Datasets | No | The paper uses simple, illustrative datasets for its examples, such as "dataset {(x1 = [1, 2], y1 = 1)}" and "dataset {(x1 = [1, 1, 1], y1 = 1), (x1 = [1, 2, 0], y1 = 10)}", but does not provide concrete access information (link, DOI, formal citation) for any publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, or testing. The examples use small, custom-defined data points for theoretical demonstration. |
| Hardware Specification | No | The paper does not explicitly describe the hardware (e.g., specific GPU/CPU models, memory) used to run its examples or experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the demonstrations or experiments. |
| Experiment Setup | Yes | Figure 1: Dependence of implicit bias on step size and momentum: In (a) (c), the blue line denotes the set G of global minima for the respective examples... (a) Mirror descent with primal momentum (Example 2): the global minimum that eq. (8) converges to depends on the momentum parameters the sub-plots contain the trajectories of eq. (8) for different choices of βt = β and γt = γ; (b) Natural gradient descent (Example 3): for different step sizes ηt = η, eq. (9) converges to different global minima. Here, η was chosen to be small enough to ensure w(t) dom(ψ). (c) Steepest descent w.r.t . 4/3 (Example 4): the global minimum to which eq. (11) converges depends on η. Here w(0) = [0, 0, 0]... |