The Implicit Bias of Gradient Descent on Separable Multiclass Data
Authors: Hrithik Ravi, Clay Scott, Daniel Soudry, Yutong Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Appendix I, we show experimental results demonstrating implicit bias towards the hard margin SVM when using the Pair Log Loss, in line with Theorem 3.4. |
| Researcher Affiliation | Academia | 1University of Michigan 2Technion Israel Institute of Technology 3Illinois Institute of Technology |
| Pseudocode | No | The paper presents theoretical proofs and mathematical derivations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code for recreating the figures can be found at https://github.com/Yutong Wang ML/neurips2024-multiclass-IR-figures |
| Open Datasets | Yes | We verify this experimentally in the Python notebook checking_conjecture_in_Appendix_H.ipynb available at https://github.com/Yutong Wang ML/neurips2024-multiclass-IR-figures. The code for recreating the figures can be found at https://github.com/Yutong Wang ML/neurips2024-multiclass-IR-figures |
| Dataset Splits | No | The paper mentions using "synthetically generated linearly separable datasets" and "randomly sampled data" but does not specify any training, validation, or test splits for these datasets. |
| Hardware Specification | No | The code can be ran on Google Colab with a CPU runtime in under one hour. |
| Software Dependencies | No | The paper mentions that the code can be run on Google Colab with a CPU runtime and provides a GitHub link to Python code, but it does not specify any particular software dependencies with version numbers (e.g., Python version, specific library versions like PyTorch or TensorFlow). |
| Experiment Setup | No | Theorem 3.4 states a condition on the learning rate as "sufficiently small learning rate 0 < η < 2β 1σ 2 max (X)" but the paper does not specify concrete hyperparameter values (e.g., specific learning rate, batch size, number of epochs, optimizer settings) used for the experiments shown in Appendix I. |