The Implicit Bias of Gradient Descent on Separable Multiclass Data

Authors: Hrithik Ravi, Clay Scott, Daniel Soudry, Yutong Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Appendix I, we show experimental results demonstrating implicit bias towards the hard margin SVM when using the Pair Log Loss, in line with Theorem 3.4.
Researcher Affiliation Academia 1University of Michigan 2Technion Israel Institute of Technology 3Illinois Institute of Technology
Pseudocode No The paper presents theoretical proofs and mathematical derivations but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Code for recreating the figures can be found at https://github.com/Yutong Wang ML/neurips2024-multiclass-IR-figures
Open Datasets Yes We verify this experimentally in the Python notebook checking_conjecture_in_Appendix_H.ipynb available at https://github.com/Yutong Wang ML/neurips2024-multiclass-IR-figures. The code for recreating the figures can be found at https://github.com/Yutong Wang ML/neurips2024-multiclass-IR-figures
Dataset Splits No The paper mentions using "synthetically generated linearly separable datasets" and "randomly sampled data" but does not specify any training, validation, or test splits for these datasets.
Hardware Specification No The code can be ran on Google Colab with a CPU runtime in under one hour.
Software Dependencies No The paper mentions that the code can be run on Google Colab with a CPU runtime and provides a GitHub link to Python code, but it does not specify any particular software dependencies with version numbers (e.g., Python version, specific library versions like PyTorch or TensorFlow).
Experiment Setup No Theorem 3.4 states a condition on the learning rate as "sufficiently small learning rate 0 < η < 2β 1σ 2 max (X)" but the paper does not specify concrete hyperparameter values (e.g., specific learning rate, batch size, number of epochs, optimizer settings) used for the experiments shown in Appendix I.