Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Any-stepsize Gradient Descent for Separable Data under Fenchel–Young Losses

Authors: Han Bao, Shinsaku Sakaue, Yuki Takezawa

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper studies theoretical aspects of optimization, which hardly face such a challenge. The gradient descent (GD) has been one of the most common optimizer in machine learning. ... We essentially leverage the classical perceptron argument to derive the iteration complexity for achieving ε-optimal loss, which is possible for a majority of Fenchel Young losses. This convergence result highlights that the self-bounding property may not be necessary for GD to attain arbitrarily small loss.
Researcher Affiliation Collaboration Han Bao The Institute of Statistical Mathematics EMAIL Shinsaku Sakaue Cyber Agent EMAIL Yuki Takezawa Kyoto University and OIST EMAIL
Pseudocode No The paper only describes steps in regular paragraph text without structured formatting. For example, GD with constant stepsize is written as follows: wt+1 := wt - η ∇L(wt), for t = 0, 1, . . . , T - 1, (GD)
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: Our synthetic experiments reported in Figure 1 are not challenging to reproduce because the dataset and model are extremely small and the problem is convex.
Open Datasets No Figure 1: Pilot studies of GD with the same toy dataset as [63]. The dataset consists of four points, x1 = [1, 0.2]T , y1 = 1, x2 = [ −2, 0.2]T , y2 = 1, x3 = [−1, 0.2]T , y3 = −1, x4 = [2, 0.2]T , and y4 = −1. ... All datasets used in the simulation in Figure 1 are synthetic.
Dataset Splits No The paper uses a very small, synthetic dataset with four points for a pilot study, explicitly listing all data points in Figure 1. There is no mention of training, validation, or test splits; the dataset is used in its entirety for the simulation presented.
Hardware Specification No Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [No] Justification: Since the synthetic experiments in Figure 1 are extremely small-scale, we do not need a huge amount of computational resources to reproduce them. The experiments can be finished within a minute with a consumer laptop.
Software Dependencies No The paper describes mathematical algorithms and theoretical analysis related to gradient descent and Fenchel Young losses. No specific software, libraries, or their version numbers are mentioned for implementation.
Experiment Setup Yes GD is run with initialization w0 = [0, 0]T . ... GD with large stepsize such as η = 24 remains to converge under the Tsallis q-loss (detailed in Section 4), even if the stepsize has gone beyond the classical stable regime. Figure 1: Pilot studies of GD with the same toy dataset as [63]. ... GD is run with initialization w0 = [0, 0]T .