Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sensitivity-Free Gradient Descent Algorithms

Authors: Ion Matei, Maksym Zhenirovskyy, Johan de Kleer, John Maxwell

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test the proposed algorithms on the problem of learning the parameters of the Cucker-Smale model. The algorithms are compared with gradient descent algorithms based on ODE solvers endowed with sensitivity analysis capabilities. We show that the proposed algorithms are at least 4x faster when implemented in Pytorch, and at least 16x faster when implemented in Jax. For large versions of the Cucker-Smale model, the Jax implementation is thousands of times faster. Our algorithms generate more accurate results both on training and test data.
Researcher Affiliation Industry Ion Matei EMAIL Intelligent Systems Laboratory PARC, part of SRI International Palo Alto, CA 94304, USA Maksym Zhenirovskyy EMAIL Intelligent Systems Laboratory PARC, part of SRI International Palo Alto, CA 94304, USA Johan de Kleer EMAIL Intelligent Systems Laboratory PARC, part of SRI International Palo Alto, CA 94304, USA John Maxwell EMAIL Intelligent Systems Laboratory PARC, part of SRI International Palo Alto, CA 94304, USA
Pseudocode Yes Algorithm 1 Gradient Descent with Sensitivities-Enabled ODE Solver. Require: α: Stepsize Require: x0: Initial state vector Require: θ0: Initial parameter vector k 0 while θk not converge do k k + 1 θk θk 1 α θ L(θk 1) end while return θk
Open Source Code No The paper discusses the use of existing libraries like Pytorch, Jax, and torchdiffeq, but does not provide an explicit statement or link for the release of the authors' own implementation of the described algorithms.
Open Datasets No We evaluated Algorithms 1-3 on the problem of learning the parameters of the particle-based Cucker-Smale ODE model. Various versions of this model can be found in Carrillo et al. (2010). This model is nonlinear, and we can easily increase the state vector size by increasing the number of particles. We used one time series as training data, generated with the same, random, initial parameters, for all cases. The training loss function is the sum of squared errors (SSE). The test data consists of model trajectories generated with random initial conditions.
Dataset Splits Yes We generated a dataset of 6000, 2-dimensional spirals, each starting at a different point, sampled at 200 equally-spaced time steps. We use the first 3000 for training, and the remaining 3000 for testing.
Hardware Specification Yes The training and testing were done on a PC with Intel 12 core Xeon 3.5 GHz CPU with 64 GB of RAM, and an NVIDIA GEFORCE RTX 2080 Ti GPU card.
Software Dependencies No The paper mentions software like Pytorch, Jax, torchdiffeq, Dopri5 ODE solver, scipy.integrate.solve_ivp, and SUNDIALS family, but does not provide specific version numbers for these key components.
Experiment Setup Yes We run the algorithm for 5k epochs and evaluated the four metrics for all combinations of algorithms and number of particles. The Pytorch implementation details of the three algorithms are shown in Table 1. In the case of Algorithm 2 we used SGD and Adam algorithms for updating the state and the parameters, respectively. Both algorithm used a stepsize lr = 0.01. In the case of Algorithm 3, we used the same combination of algorithms, but the stepsize for SGD is lr = 1, so that the product between the two stepsizes in 0.01. The optimization algorithms implementation details are shown in Table 4. In the case of Algorithm 2, for stability reasons, we changed the stepsizes for the SGD and Adam algorithm, while making sure that their product is equal to the stepize product used in the Pytorch implementation, i.e., 0.01. We set the maximum number of iterations to 4000, and use Adam, with a constant learning rate of 0.001, as optimization algorithm.