Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

General Loss Functions Lead to (Approximate) Interpolation in High Dimensions

Authors: Kuo-Wei Lai, Vidya Muthukumar

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Recent empirical and theoretical work provides a mixed and incomplete picture of the impact of loss. On one hand, large-scale empirical studies (Hui and Belkin, 2020; Kline and Berardi, 2005; Golik et al., 2013; Janocha and Czarnecki, 2017) have shown that the less popular squared loss generates surprisingly competitive performance to the popular cross-entropy loss (the multiclass extension of the binary logistic loss). [...] Finally, in Section 4 we provide partial evidence for the tightness of our arguments. First, in Proposition 13 we show that the conditions for exact equivalence in Theorem 4 are not only sufficient but necessary. [...] Figure 3: Panel (a) illustrates the relationship between the vectors q, g-1(q) and 1 for the loss function ℓ(z) = 1/(1-z). Panel (b) is a simulation that compares the implicit bias of gradient descent to the MNI. [...] Figure 4: Panel (a) compares the implicit bias of gradient descent to the one-vs-all MNI. [...] Panel (b) visualizes the normalized training data margins induced by importance weighting on different loss functions in Corollary 14. [...] In Appendix E, we provide corresponding simulations on random data.
Researcher Affiliation Academia Kuo-Wei Lai EMAIL School of Electrical & Computer Engineering Georgia Institute of Technology Atlanta, GA 30332, USA Vidya Muthukumar EMAIL School of Electrical & Computer Engineering H. Milton School of Industrial & Systems Engineering Georgia Institute of Technology Atlanta, GA 30332, USA
Pseudocode No The paper describes algorithms and methods through mathematical formulations and textual descriptions but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper states 'License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v26/23-1078.html.' This link refers to attribution requirements for the paper itself, not source code for the methodology presented in the paper. There is no explicit statement about making their code available.
Open Datasets No The paper describes data generation for simulations, such as 'Assume independent and identically distributed data {xi, yi}n i=1 such that each covariate satisfies one of the following: a) xi N(0, Σ), and we denote the spectrum of Σ by λ; or b) xi = diag (λ)1/2 zi, where zi has independent entries such that each zij is mean-zero, unit-variance, and sub-Gaussian with parameter v > 0'. It does not refer to any specific publicly available datasets with access information.
Dataset Splits Yes Figure 3: [...] Panel (b) is a simulation that compares the implicit bias of gradient descent to the MNI. The covariate-response pairs {xi, yi}n i=1 are independently and identically distributed (IID) with a fixed sample size n = 100 and varying data dimension d, where xi is isotropic Gaussian and yi is uniformly distributed in {−1}. [...] Figure 4: Panel (a) compares the implicit bias of gradient descent to the one-vs-all MNI. The simulation setup is the same as Figure 3b with K = 5 classes, and labels drawn uniformly at random in [K]. Panel (b) visualizes the normalized training data margins induced by importance weighting on different loss functions in Corollary 14. We consider the idealized assumption XXT = αI with n = 100 and d = 5000. The first 70 examples are majority examples and labeled as yi = +1, and the rest of the 30 examples are minority examples labeled as yi = −1. Note that we apply the importance weighting factor Q = 2.0 only to the minority examples.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, memory specifications, or cloud/cluster configurations.
Software Dependencies No The paper does not specify any software dependencies, libraries, or their version numbers used for the experiments.
Experiment Setup Yes Figure 3: Panel (b) is a simulation that compares the implicit bias of gradient descent to the MNI. [...] Gradient descent is run for the minimum of 10^3 iterations or when the empirical risk falls below 10^-12. [...] Figure 4: Panel (b) visualizes the normalized training data margins induced by importance weighting on different loss functions in Corollary 14. [...] We run gradient descent on different loss functions for a minimum of 10^4 iterations, or when the empirical risk falls below 10^-12.