Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Consensus Based Stochastic Optimal Control
Authors: Liyao Lyu, Jingrun Chen
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical results confirm the accuracy and scalability of our approach across various problem dimensions and show the potential for extension to mean-field control problems. ... We evaluate the performance of the Adam-CBO method across various problem settings, including the linear quadratic control problem in 1, 2, 4, 8, and 16 dimensions, the Ginzburg-Landau model, and the systemic risk meanfield control problem with 50, 100, 200, 400, 800 agents. |
| Researcher Affiliation | Academia | 1Department of Computational Mathematics, Science & Engineering, Michigan State University, MI 48824, USA 2School of Mathematical Sciences and Suzhou Institute for Advanced Research, University of Science and Technology of China, and Suzhou Big Data & AI Research and Engineering Center, Suzhou 215127, China . |
| Pseudocode | Yes | Algorithm 1 Consensus Based Optimization with Momentum Algorithm 2 Consensus-based Optimization with Adaptive Momentum |
| Open Source Code | Yes | Our code is available at https://github.com/Lyuliyao/ADAM_CBO_control. |
| Open Datasets | Yes | We also compare our method with DDPG, PPO, SAC, TD3, TQC, and Cross Q (using the stable-baselines3 implement https://github.com/araffin/sbx) on Pendulum-v1 as well as PPO and DQN on Cart Pole-v1. |
| Dataset Splits | Yes | The control policy is initially trained using a delta distribution centered on x0 and n = 100 and then tested against different values of n = 50, 100, 200, 400, 800. Furthermore, the value function is evaluated by taking the expectation of controlled dynamics starting from different initial distributions µ0, including Gaussian random variable x0 = N(0, 0.1), mixture of two Gaussian random variables x0 = p( k+θy)+(1 P)(k+ θz) with P a Bernoulli random variable with parameter 1/3 10 , θ = 0.1, y, z N(0, 1) and mixture of three Gaussian random variables: x0 = [ k 3U =0 + k 3U =1] + θy with k = 0.3, θ = 0.07, y N(0, 1). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU or CPU models. It focuses on the algorithmic and software aspects of the experiments. |
| Software Dependencies | No | We also compare our method with DDPG, PPO, SAC, TD3, TQC, and Cross Q (using the stable-baselines3 implement https://github.com/araffin/sbx) on Pendulum-v1 as well as PPO and DQN on Cart Pole-v1. While 'stable-baselines3' is mentioned, specific version numbers for this or any other software dependencies are not provided. |
| Experiment Setup | Yes | In both methods, the number of SDE to compute the value function is 64 and the learning rate is 1e-2. In M-CBO and Adam-CBO methods, the number of agents is specified as N = 5000, and M = 50 agents are randomly selected to update in each step. We investigate the LQG problem in dimension d = 1, 2, 4, 8, and 16, with a terminal time of T = 1 and a timestep of T/20. ... We start with a simple case with d = 2, µ = 10, λ = 0.2. ... We test the performance of our method with parameters c = 2, k = 0.6, and η = 2. |