Verifying Reinforcement Learning up to Infinity

Authors: Edoardo Bacci, Mirco Giacobbe, David Parker

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate its efficacy on a range of benchmark control problems. and We evaluate our method over multiple agents for 3 benchmark control problems: a bouncing ball, automated cruise control, and cart-pole. ... Results are shown in Tab. 1
Researcher Affiliation Academia Edoardo Bacci1 , Mirco Giacobbe2 , David Parker1 1University of Birmingham 2University of Oxford
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. The methodology is described using text and mathematical equations.
Open Source Code Yes 1https://github.com/phate09/Safe RL Infinity
Open Datasets Yes We evaluate our method over multiple agents for 3 benchmark control problems: a bouncing ball, automated cruise control, and cart-pole. We used standard feed forward architectures... [Jaeger et al., 2019; Tran et al., 2020; Brockman et al., 2016].
Dataset Splits No The paper describes training RL agents in simulated environments but does not provide specific training/validation/test dataset splits as it's not a traditional supervised learning setup with fixed datasets.
Hardware Specification Yes We ran our experiments on a 4-core 4.2GHz with 64GB RAM.
Software Dependencies No The paper mentions using 'proximal policy optimisation (PPO)' and 'Open AI Gym' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We used standard feed forward architectures with 2 hidden layers of size 64 (32 for the bouncing ball), and Re LU activation functions; we used a learning rate of 5e 4. and We terminate training either when our agent reaches a mean reward of 900 or after 5M training steps.