Verifying Reinforcement Learning up to Infinity
Authors: Edoardo Bacci, Mirco Giacobbe, David Parker
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate its efficacy on a range of benchmark control problems. and We evaluate our method over multiple agents for 3 benchmark control problems: a bouncing ball, automated cruise control, and cart-pole. ... Results are shown in Tab. 1 |
| Researcher Affiliation | Academia | Edoardo Bacci1 , Mirco Giacobbe2 , David Parker1 1University of Birmingham 2University of Oxford |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. The methodology is described using text and mathematical equations. |
| Open Source Code | Yes | 1https://github.com/phate09/Safe RL Infinity |
| Open Datasets | Yes | We evaluate our method over multiple agents for 3 benchmark control problems: a bouncing ball, automated cruise control, and cart-pole. We used standard feed forward architectures... [Jaeger et al., 2019; Tran et al., 2020; Brockman et al., 2016]. |
| Dataset Splits | No | The paper describes training RL agents in simulated environments but does not provide specific training/validation/test dataset splits as it's not a traditional supervised learning setup with fixed datasets. |
| Hardware Specification | Yes | We ran our experiments on a 4-core 4.2GHz with 64GB RAM. |
| Software Dependencies | No | The paper mentions using 'proximal policy optimisation (PPO)' and 'Open AI Gym' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We used standard feed forward architectures with 2 hidden layers of size 64 (32 for the bouncing ball), and Re LU activation functions; we used a learning rate of 5e 4. and We terminate training either when our agent reaches a mean reward of 900 or after 5M training steps. |