Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent

Authors: Adrien Bolland, Ioannis Boukas, Mathias Berger, Damien Ernst

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We assess the performance of our algorithm in three environments concerned with the design and control of a mass-spring-damper system, a small-scale off-grid power system and a drone, respectively. In addition, our algorithm is benchmarked against a state-of-the-art deep reinforcement learning algorithm used to tackle joint design and control problems. We show that DEPS performs at least as well or better in all three environments, consistently yielding solutions with higher returns in fewer iterations.
Researcher Affiliation Academia Adrien Bolland EMAIL Ioannis Boukas EMAIL Mathias Berger EMAIL Damien Ernst EMAIL Montefiore Institute University of Liège Liège, Belgium
Pseudocode Yes Algorithm 1 summarizes the steps performed in the DEPS algorithm. The execution of the projected stochastic gradient ascent algorithm for optimizing the objective in equation (5) is shown in more details in Algorithm 2 in Appendix B.
Open Source Code Yes The implementation of our algorithm and of the different benchmarks are provided in the following Git Hub repository: https://github.com/adrienBolland/Jointly-Learning-Environments-and-Control-Policies-with-Projected-Stochastic-Gradient-Ascent
Open Datasets No The paper defines three environments, 'mass-spring-damper system', 'small-scale off-grid power system' and 'drone', and details their parameters and dynamics in Appendices D, E, and F. For example, in Appendix E, Table 9 presents 'Electrical load consumption and PV production capacity factor data' which are parameters for their simulated microgrid environment. No external public datasets are explicitly mentioned with access information.
Dataset Splits No The expected return is computed from 64 Monte-Carlo samples (i.e., by sampling 64 i.i.d. trajectories). In addition, we note that the different algorithms we use for selecting policies are stochastic. Hence, we naturally report the average expected return of the pair of policies and environments computed by those algorithms, which is estimated by averaging the performance over ten runs (random seeds) of those algorithms.
Hardware Specification No The paper does not provide specific hardware details such as CPU/GPU models, memory, or specific computing platforms used for running the experiments.
Software Dependencies No Furthermore, the Adam algorithm (Kingma & Ba, 2014) is used for updating (ψ, θ). (Section 5.2.1) In our implementation, Py Torch handles this automatically. (Appendix E) Open AI Gym library (Brockman et al., 2016). (Introduction) None of these mentions specific version numbers.
Experiment Setup Yes The gradients are estimated on batches of M = 64 trajectories and the step size α of the Adam algorithm is chosen equal to 0.005 for both the environment and the policy gradients. We retain the default values for the other parameters of the Adam algorithm. Moreover, the inputs of the policy are z-normalized using mean vector (xref, 0, 0) and standard deviation vector (0.005, 0.02, 100), which is an approximation of the standard deviation vector of the states collected over high-performing trajectories. (Section 5.2.1) The parameters of the JODC algorithm are provided in Table 11, 12 and 13 for the MSD, the microgrid and the drone environments, respectively.