Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Any-stepsize Gradient Descent for Separable Data under Fenchel–Young Losses
Authors: Han Bao, Shinsaku Sakaue, Yuki Takezawa
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper studies theoretical aspects of optimization, which hardly face such a challenge. The gradient descent (GD) has been one of the most common optimizer in machine learning. ... We essentially leverage the classical perceptron argument to derive the iteration complexity for achieving ε-optimal loss, which is possible for a majority of Fenchel Young losses. This convergence result highlights that the self-bounding property may not be necessary for GD to attain arbitrarily small loss. |
| Researcher Affiliation | Collaboration | Han Bao The Institute of Statistical Mathematics EMAIL Shinsaku Sakaue Cyber Agent EMAIL Yuki Takezawa Kyoto University and OIST EMAIL |
| Pseudocode | No | The paper only describes steps in regular paragraph text without structured formatting. For example, GD with constant stepsize is written as follows: wt+1 := wt - η ∇L(wt), for t = 0, 1, . . . , T - 1, (GD) |
| Open Source Code | No | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: Our synthetic experiments reported in Figure 1 are not challenging to reproduce because the dataset and model are extremely small and the problem is convex. |
| Open Datasets | No | Figure 1: Pilot studies of GD with the same toy dataset as [63]. The dataset consists of four points, x1 = [1, 0.2]T , y1 = 1, x2 = [ −2, 0.2]T , y2 = 1, x3 = [−1, 0.2]T , y3 = −1, x4 = [2, 0.2]T , and y4 = −1. ... All datasets used in the simulation in Figure 1 are synthetic. |
| Dataset Splits | No | The paper uses a very small, synthetic dataset with four points for a pilot study, explicitly listing all data points in Figure 1. There is no mention of training, validation, or test splits; the dataset is used in its entirety for the simulation presented. |
| Hardware Specification | No | Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [No] Justification: Since the synthetic experiments in Figure 1 are extremely small-scale, we do not need a huge amount of computational resources to reproduce them. The experiments can be finished within a minute with a consumer laptop. |
| Software Dependencies | No | The paper describes mathematical algorithms and theoretical analysis related to gradient descent and Fenchel Young losses. No specific software, libraries, or their version numbers are mentioned for implementation. |
| Experiment Setup | Yes | GD is run with initialization w0 = [0, 0]T . ... GD with large stepsize such as η = 24 remains to converge under the Tsallis q-loss (detailed in Section 4), even if the stepsize has gone beyond the classical stable regime. Figure 1: Pilot studies of GD with the same toy dataset as [63]. ... GD is run with initialization w0 = [0, 0]T . |