Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
General Loss Functions Lead to (Approximate) Interpolation in High Dimensions
Authors: Kuo-Wei Lai, Vidya Muthukumar
JMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Recent empirical and theoretical work provides a mixed and incomplete picture of the impact of loss. On one hand, large-scale empirical studies (Hui and Belkin, 2020; Kline and Berardi, 2005; Golik et al., 2013; Janocha and Czarnecki, 2017) have shown that the less popular squared loss generates surprisingly competitive performance to the popular cross-entropy loss (the multiclass extension of the binary logistic loss). [...] Finally, in Section 4 we provide partial evidence for the tightness of our arguments. First, in Proposition 13 we show that the conditions for exact equivalence in Theorem 4 are not only sufficient but necessary. [...] Figure 3: Panel (a) illustrates the relationship between the vectors q, g-1(q) and 1 for the loss function ℓ(z) = 1/(1-z). Panel (b) is a simulation that compares the implicit bias of gradient descent to the MNI. [...] Figure 4: Panel (a) compares the implicit bias of gradient descent to the one-vs-all MNI. [...] Panel (b) visualizes the normalized training data margins induced by importance weighting on different loss functions in Corollary 14. [...] In Appendix E, we provide corresponding simulations on random data. |
| Researcher Affiliation | Academia | Kuo-Wei Lai EMAIL School of Electrical & Computer Engineering Georgia Institute of Technology Atlanta, GA 30332, USA Vidya Muthukumar EMAIL School of Electrical & Computer Engineering H. Milton School of Industrial & Systems Engineering Georgia Institute of Technology Atlanta, GA 30332, USA |
| Pseudocode | No | The paper describes algorithms and methods through mathematical formulations and textual descriptions but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states 'License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v26/23-1078.html.' This link refers to attribution requirements for the paper itself, not source code for the methodology presented in the paper. There is no explicit statement about making their code available. |
| Open Datasets | No | The paper describes data generation for simulations, such as 'Assume independent and identically distributed data {xi, yi}n i=1 such that each covariate satisfies one of the following: a) xi N(0, Σ), and we denote the spectrum of Σ by λ; or b) xi = diag (λ)1/2 zi, where zi has independent entries such that each zij is mean-zero, unit-variance, and sub-Gaussian with parameter v > 0'. It does not refer to any specific publicly available datasets with access information. |
| Dataset Splits | Yes | Figure 3: [...] Panel (b) is a simulation that compares the implicit bias of gradient descent to the MNI. The covariate-response pairs {xi, yi}n i=1 are independently and identically distributed (IID) with a fixed sample size n = 100 and varying data dimension d, where xi is isotropic Gaussian and yi is uniformly distributed in {−1}. [...] Figure 4: Panel (a) compares the implicit bias of gradient descent to the one-vs-all MNI. The simulation setup is the same as Figure 3b with K = 5 classes, and labels drawn uniformly at random in [K]. Panel (b) visualizes the normalized training data margins induced by importance weighting on different loss functions in Corollary 14. We consider the idealized assumption XXT = αI with n = 100 and d = 5000. The first 70 examples are majority examples and labeled as yi = +1, and the rest of the 30 examples are minority examples labeled as yi = −1. Note that we apply the importance weighting factor Q = 2.0 only to the minority examples. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models, memory specifications, or cloud/cluster configurations. |
| Software Dependencies | No | The paper does not specify any software dependencies, libraries, or their version numbers used for the experiments. |
| Experiment Setup | Yes | Figure 3: Panel (b) is a simulation that compares the implicit bias of gradient descent to the MNI. [...] Gradient descent is run for the minimum of 10^3 iterations or when the empirical risk falls below 10^-12. [...] Figure 4: Panel (b) visualizes the normalized training data margins induced by importance weighting on different loss functions in Corollary 14. [...] We run gradient descent on different loss functions for a minimum of 10^4 iterations, or when the empirical risk falls below 10^-12. |