Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent

Authors: Adrien Bolland, Ioannis Boukas, Mathias Berger, Damien Ernst

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We assess the performance of our algorithm in three environments concerned with the design and control of a mass-spring-damper system, a small-scale oﬀ-grid power system and a drone, respectively. In addition, our algorithm is benchmarked against a state-of-the-art deep reinforcement learning algorithm used to tackle joint design and control problems. We show that DEPS performs at least as well or better in all three environments, consistently yielding solutions with higher returns in fewer iterations.
Researcher Affiliation	Academia	Adrien Bolland EMAIL Ioannis Boukas EMAIL Mathias Berger EMAIL Damien Ernst EMAIL Monteﬁore Institute University of Liège Liège, Belgium
Pseudocode	Yes	Algorithm 1 summarizes the steps performed in the DEPS algorithm. The execution of the projected stochastic gradient ascent algorithm for optimizing the objective in equation (5) is shown in more details in Algorithm 2 in Appendix B.
Open Source Code	Yes	The implementation of our algorithm and of the diﬀerent benchmarks are provided in the following Git Hub repository: https://github.com/adrienBolland/Jointly-Learning-Environments-and-Control-Policies-with-Projected-Stochastic-Gradient-Ascent
Open Datasets	No	The paper defines three environments, 'mass-spring-damper system', 'small-scale oﬀ-grid power system' and 'drone', and details their parameters and dynamics in Appendices D, E, and F. For example, in Appendix E, Table 9 presents 'Electrical load consumption and PV production capacity factor data' which are parameters for their simulated microgrid environment. No external public datasets are explicitly mentioned with access information.
Dataset Splits	No	The expected return is computed from 64 Monte-Carlo samples (i.e., by sampling 64 i.i.d. trajectories). In addition, we note that the diﬀerent algorithms we use for selecting policies are stochastic. Hence, we naturally report the average expected return of the pair of policies and environments computed by those algorithms, which is estimated by averaging the performance over ten runs (random seeds) of those algorithms.
Hardware Specification	No	The paper does not provide specific hardware details such as CPU/GPU models, memory, or specific computing platforms used for running the experiments.
Software Dependencies	No	Furthermore, the Adam algorithm (Kingma & Ba, 2014) is used for updating (ψ, θ). (Section 5.2.1) In our implementation, Py Torch handles this automatically. (Appendix E) Open AI Gym library (Brockman et al., 2016). (Introduction) None of these mentions specific version numbers.
Experiment Setup	Yes	The gradients are estimated on batches of M = 64 trajectories and the step size α of the Adam algorithm is chosen equal to 0.005 for both the environment and the policy gradients. We retain the default values for the other parameters of the Adam algorithm. Moreover, the inputs of the policy are z-normalized using mean vector (xref, 0, 0) and standard deviation vector (0.005, 0.02, 100), which is an approximation of the standard deviation vector of the states collected over high-performing trajectories. (Section 5.2.1) The parameters of the JODC algorithm are provided in Table 11, 12 and 13 for the MSD, the microgrid and the drone environments, respectively.