Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Verifying Reinforcement Learning up to Infinity
Authors: Edoardo Bacci, Mirco Giacobbe, David Parker
IJCAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate its efficacy on a range of benchmark control problems. and We evaluate our method over multiple agents for 3 benchmark control problems: a bouncing ball, automated cruise control, and cart-pole. ... Results are shown in Tab. 1 |
| Researcher Affiliation | Academia | Edoardo Bacci1 , Mirco Giacobbe2 , David Parker1 1University of Birmingham 2University of Oxford |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. The methodology is described using text and mathematical equations. |
| Open Source Code | Yes | 1https://github.com/phate09/Safe RL Infinity |
| Open Datasets | Yes | We evaluate our method over multiple agents for 3 benchmark control problems: a bouncing ball, automated cruise control, and cart-pole. We used standard feed forward architectures... [Jaeger et al., 2019; Tran et al., 2020; Brockman et al., 2016]. |
| Dataset Splits | No | The paper describes training RL agents in simulated environments but does not provide specific training/validation/test dataset splits as it's not a traditional supervised learning setup with fixed datasets. |
| Hardware Specification | Yes | We ran our experiments on a 4-core 4.2GHz with 64GB RAM. |
| Software Dependencies | No | The paper mentions using 'proximal policy optimisation (PPO)' and 'Open AI Gym' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We used standard feed forward architectures with 2 hidden layers of size 64 (32 for the bouncing ball), and Re LU activation functions; we used a learning rate of 5e 4. and We terminate training either when our agent reaches a mean reward of 900 or after 5M training steps. |