Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Deep learning for continuous-time stochastic control with jumps
Authors: Patrick Cheridito, Jean-Loup Dupret, Donatien Hainaut
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations on different problems illustrate the accuracy and scalability of our approach, demonstrating its effectiveness in solving complex high-dimensional stochastic control tasks. Code is available at https://github.com/jdupret97/ Deep-Learning-for-CT-Stochastic-Control-with-Jumps. [...] In this paper, we introduce a deep model-based approach for stochastic control problems with jumps that takes the system dynamics (2) into account by leveraging the HJB equation (3). [...] We illustrate the accuracy and scalability of our approach in different numerical examples and provide comparisons with popular RL and deep-learning control methods. |
| Researcher Affiliation | Academia | Patrick Cheridito Department of Mathematics ETH Zurich, Switzerland Jean-Loup Dupret Department of Mathematics ETH Zurich, Switzerland Donatien Hainaut LIDAM-ISBA UCLouvain, Belgium EMAIL, {donatien.hainaut}@uclouvain.be |
| Pseudocode | Yes | Algorithm 1 GPI-PINN [...] Algorithm 2 GPI-CBU |
| Open Source Code | Yes | Code is available at https://github.com/jdupret97/ Deep-Learning-for-CT-Stochastic-Control-with-Jumps. |
| Open Datasets | No | The paper uses mathematical models (Linear-quadratic regulator with jumps, Optimal consumption-investment with jumps) for its experiments, defining dynamics and parameters. It does not explicitly use or provide access to external, publicly available datasets. The data for experiments is generated based on these models. |
| Dataset Splits | Yes | Figure 1 compares the performances of GPI-PINN and GPI-CBU on a 10-dimensional (d = 10) LQR problem [...] It shows mean absolute errors of Vθ(k) with respect to V given by MAEV = 1 M PM i=1 |Vθ(k)(ti, xi) V (ti, xi)| on a test set of size M uniformly sampled from [0, 1] [ 2.5, 2.5]d together with runtimes as functions of the number of epochs k. [...] For both Algorithms 1 and 2, we sample the time and space points independently from the uniform distributions U[0,T ] and U[0,yb], respectively. The parameters of the optimal investment problem below are as follows: T = 1, yb = 150, r = 0.02, ρ = 0.045, δ = 0.7, λ = 0.45 1n, µZ = 0.25 1n, σZ = 0.2 1n, µ = 0.032 1n, σ = In, Σ = 0.2 1n n with diag(Σ) = 1n. |
| Hardware Specification | Yes | Algorithms 1 and 2 were implemented using Tensor Flow and Keras with GPU acceleration on an NVIDIA RTX 4090. |
| Software Dependencies | No | Algorithms 1 and 2 were implemented using Tensor Flow and Keras with GPU acceleration on an NVIDIA RTX 4090. The network parameters are updated using Adam (Kingma & Ba, 2014). The paper mentions software tools like TensorFlow and Keras, and an optimizer Adam, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Throughout the numerical examples of the paper, we use L = 3 with N = 50 neurons in each of the DGM layers, see Figure 5. In the value network, we use tanh for the activation function σV and softplus for σVout. For the control network, we adopt tanh for σα, while as output activation σαout we choose the identity in Example 5.1 and softplus in Example 5.2. Unless stated otherwise, we use a number of sample points M1, M2 equal to 256, a number of gradient steps N1, N2 equal to 64 and a maximum number of epochs k = 1500. The network parameters are updated using Adam (Kingma & Ba, 2014) with constant learning rates η1, η2 equal to 0.001. [...] In our experiments, we set ζ = 1 as it provides a good trade-off between convergence speed and accuracy of the improvements in GPI-CBU. |