Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Automatic Differentiation of Optimization Algorithms with Time-Varying Updates
Authors: Sheheryar Mehmood, Peter Ochs
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To test our results, we provide numerical demonstration on a few examples from classical Machine Learning. These include lasso regression, that is, ... We solve the three problems through PGD with four different choices of step sizes and APG with fixed step size and βk := (k − 1)/(k +5) (depicted by APG in Figure 1). ... In Figure 1, the top row shows the median error plots of the five algorithms and the bottom row shows the errors of the corresponding derivatives with the same colour. |
| Researcher Affiliation | Academia | 1Department of Mathematics & Computer Science, Saarland University, Saarbr ucken, Germany. Correspondence to: Sheheryar Mehmood <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Proximal Gradient with Extrapolation. Initialization: x(0) = x(-1) ∈ X, u ∈ U, 0 < α_ <= α < 2/L. Parameter: (αk)k∈N ∈ [α_, α] and (βk)k∈N ∈ [0, 1]. Update k ≥ 0: y(k) := (1 + βk)x(k) − βkx(k-1) w(k) := y(k) − αk ∇xf(y(k), u) x(k+1) := Pαkg(w(k), u). |
| Open Source Code | No | The paper mentions autograd libraries like PyTorch, TensorFlow, and JAX as tools used, but does not provide specific access to the authors' own implementation code for the methodology described. |
| Open Datasets | Yes | We solve (16) for 50 randomly generated datasets, (17) for 50 perturbed instances of MADELON dataset (Dua & Graff, 2017), and (18) for a single instance of CIFAR10 dataset (Krizhevsky, 2009). |
| Dataset Splits | No | For (17), we use MADELON dataset with M = 2, 000 samples and N = 501 features. ... For (18), we use CIFAR10 dataset with M = 50, 000 samples N = 32 × 32 × 3 features. The paper specifies the total number of samples for these datasets but does not provide specific training/validation/test splits. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or memory) are provided in the paper for running the experiments. |
| Software Dependencies | No | A crucial advantage of AD is that it provides a nice blackbox implementation thanks to the powerful autograd libraries included in Py Torch (Paszke et al., 2019), Tensor Flow (Abadi et al., 2016), and JAX (Bradbury et al., 2018). While these software packages are mentioned, no specific version numbers are provided for their usage in the experiments. |
| Experiment Setup | Yes | We solve the three problems through PGD with four different choices of step sizes and APG with fixed step size and βk := (k − 1)/(k +5) (depicted by APG in Figure 1). ... For each problem, we run PGD with four different choices of step size, namely, (i) αk = 2/(L + m) for (17) and αk = 1/L for (16), (ii) αk ∈ U(0, 2/3L), (iii) αk ∈ U(2/3L, 4/3L), and (iv) αk ∈ U(4/3L, 2/L), for each k ∈ N. We also run APG with αk = 1/L and βk = (k − 1)/(k + 5). Before starting each algorithm, we obtain w(0) ∈ B10−2(w∗) by partially solving each problem through APG. |