Search-Based Testing of Reinforcement Learning
Authors: Martin Tappler, Filip Cano Córdoba, Bernhard K. Aichernig, Bettina Könighofer
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our testing framework on trained deep RL agents for the video game Super Mario Bros., trained with varying numbers of training episodes and different action sets. |
| Researcher Affiliation | Collaboration | 1Institute of Software Technology, Graz University of Technology 2Institute of Applied Information Processing and Communications, Graz University of Technology 3TU Graz-SAL DES Lab,Silicon Austria Labs, Graz, Austria 4Lamarr Security Research |
| Pseudocode | Yes | Algorithm 1: Search for Reference Trace τref |
| Open Source Code | Yes | Details on the learning parameters along with more experiments and the source code are included in the technical appendix. |
| Open Datasets | Yes | We use a third-party RL agent that operates in the Open AI gym environment and uses double deep Q-Networks [Feng et al., 2020]. and Open AI Gym [Brockman et al., 2016] |
| Dataset Splits | No | The paper mentions 'training for 80k episodes' and 'testing' but does not specify explicit train/validation/test dataset splits, percentages, or sample counts for reproducibility. In an RL context, this would typically involve details on episode counts or interaction steps for each phase, which are not detailed as 'splits'. |
| Hardware Specification | No | The paper mentions 'a standard laptop' and 'a dedicated cluster' for training, and acknowledges 'HPC resources provided by the ZID of Graz University of Technology', but none of these provide specific hardware details such as GPU/CPU models, processor types, or memory amounts. |
| Software Dependencies | No | The paper mentions the use of 'Open AI gym' and implies 'PyTorch' through a citation to a PyTorch tutorial, but it does not provide specific version numbers for these or any other software dependencies required for replication. |
| Experiment Setup | Yes | We fuzzed for 50 generations with a population of 50, used a mutation stop-probability pmstop = 0.2 with effect size ms = 15 and fitness weights λcov = 2, λpos = 1.5, and λneg = 1, to focus on exploration. With a crossover probability of 0.25, we mostly rely on mutations. For safety testing, we use 10 repetitions and a test length of l = 40 and for performance testing we use ntest = nep = 10 and step width w = 20. |