Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

The Implicit Bias of Gradient Descent on Separable Multiclass Data

Authors: Hrithik Ravi, Clay Scott, Daniel Soudry, Yutong Wang

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Appendix I, we show experimental results demonstrating implicit bias towards the hard margin SVM when using the Pair Log Loss, in line with Theorem 3.4.
Researcher Affiliation Academia 1University of Michigan 2Technion Israel Institute of Technology 3Illinois Institute of Technology
Pseudocode No The paper presents theoretical proofs and mathematical derivations but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Code for recreating the figures can be found at https://github.com/Yutong Wang ML/neurips2024-multiclass-IR-figures
Open Datasets Yes We verify this experimentally in the Python notebook checking_conjecture_in_Appendix_H.ipynb available at https://github.com/Yutong Wang ML/neurips2024-multiclass-IR-figures. The code for recreating the figures can be found at https://github.com/Yutong Wang ML/neurips2024-multiclass-IR-figures
Dataset Splits No The paper mentions using "synthetically generated linearly separable datasets" and "randomly sampled data" but does not specify any training, validation, or test splits for these datasets.
Hardware Specification No The code can be ran on Google Colab with a CPU runtime in under one hour.
Software Dependencies No The paper mentions that the code can be run on Google Colab with a CPU runtime and provides a GitHub link to Python code, but it does not specify any particular software dependencies with version numbers (e.g., Python version, specific library versions like PyTorch or TensorFlow).
Experiment Setup No Theorem 3.4 states a condition on the learning rate as "sufficiently small learning rate 0 < η < 2β 1σ 2 max (X)" but the paper does not specify concrete hyperparameter values (e.g., specific learning rate, batch size, number of epochs, optimizer settings) used for the experiments shown in Appendix I.