Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Descent with Misaligned Gradients and Applications to Hidden Convexity

Authors: Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We obtain an optimization algorithm that identifies an ϵ-suboptimal point with the optimal iteration complexity of O(ϵ 2); for the more general case where the changes need not be slow, we obtain an algorithm with O(ϵ 3) iteration complexity. As an application of our framework, we consider optimization problems with a hidden convexity property, and obtain an algorithm with O(ϵ 3) iteration complexity. Our study is also related to the recent body of work on online optimization with hints, where a weakly correlated hint vector was sufficient to obtain logarithmic regret (Dekel et al., 2017; Bhaskara et al., 2020; 2021); however, in those settings, the algorithm sees the unbiased gradient after each step and can thus correct itself , which is not possible in our setting. In this section, we assume that the misaligned stochastic gradients are obtained by a smooth matrixbased transformation. We start by formally stating the assumptions necessary in this section. (A1) Oracle & Assumptions. Let f be H-Lipschitz and convex. We assume that the algorithm has access to an oracle that generates stochastic gradients as follows: given x, the oracle returns a random vector h(x) with h(x) H with probability 1 and E[h(x)] = A(x) f(x) where A(x) Rd d denotes an SPD matrix that perturbs the stochastic gradients. As outlined earlier, such a transformation ensures the condition E[h(x)], f(x) 0. We assume that A(x) 1 A(y) 1 op ρ x y 2 for any x, y Rd and assume that the eigenvalues of A(x) lie in [λmin, λmax] for all x Rd.
Researcher Affiliation	Collaboration	Aditya Bhaskara University of Utah Salt Lake City, UT EMAIL Ashok Cutkosky Boston University Boston, MA EMAIL Ravi Kumar Google Research Mountain View, CA EMAIL Manish Purohit Google Research Mountain View, CA EMAIL
Pseudocode	Yes	Algorithm 1 Non-Smooth Optimization with Matrix-Transformed Gradients. Algorithm 2 Smooth Convex Optimization with Misaligned Gradients. Algorithm 3 Optimization of Hidden Convex Functions.
Open Source Code	No	The paper does not provide any statement or link regarding the availability of source code for the described methodology.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and convergence proofs. It discusses applications in areas like reinforcement learning and revenue management but does not use specific datasets for empirical evaluation, nor does it provide concrete access information for any open datasets.
Dataset Splits	No	The paper is theoretical and does not involve experiments with datasets, thus it does not provide information on dataset splits.
Hardware Specification	No	The paper is theoretical, presenting algorithms and their convergence properties, and does not report on experimental results requiring specific hardware. Therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe any specific software dependencies with version numbers required for implementation or experimentation.
Experiment Setup	No	The paper describes theoretical algorithms and their convergence rates but does not provide specific experimental setup details such as concrete hyperparameter values or training configurations. The 'Parameter tuning' section states that 'All of our algorithms assume knowledge of various problem parameters (e.g. H, L, or R) that are used to specify optimized learning rates and other hyperparameters. In practice, the learning would need to be adjusted using standard hyperparameter tuning techniques.'