Search-Based Testing of Reinforcement Learning

Authors: Martin Tappler, Filip Cano Córdoba, Bernhard K. Aichernig, Bettina Könighofer

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our testing framework on trained deep RL agents for the video game Super Mario Bros., trained with varying numbers of training episodes and different action sets.
Researcher Affiliation Collaboration 1Institute of Software Technology, Graz University of Technology 2Institute of Applied Information Processing and Communications, Graz University of Technology 3TU Graz-SAL DES Lab,Silicon Austria Labs, Graz, Austria 4Lamarr Security Research
Pseudocode Yes Algorithm 1: Search for Reference Trace τref
Open Source Code Yes Details on the learning parameters along with more experiments and the source code are included in the technical appendix.
Open Datasets Yes We use a third-party RL agent that operates in the Open AI gym environment and uses double deep Q-Networks [Feng et al., 2020]. and Open AI Gym [Brockman et al., 2016]
Dataset Splits No The paper mentions 'training for 80k episodes' and 'testing' but does not specify explicit train/validation/test dataset splits, percentages, or sample counts for reproducibility. In an RL context, this would typically involve details on episode counts or interaction steps for each phase, which are not detailed as 'splits'.
Hardware Specification No The paper mentions 'a standard laptop' and 'a dedicated cluster' for training, and acknowledges 'HPC resources provided by the ZID of Graz University of Technology', but none of these provide specific hardware details such as GPU/CPU models, processor types, or memory amounts.
Software Dependencies No The paper mentions the use of 'Open AI gym' and implies 'PyTorch' through a citation to a PyTorch tutorial, but it does not provide specific version numbers for these or any other software dependencies required for replication.
Experiment Setup Yes We fuzzed for 50 generations with a population of 50, used a mutation stop-probability pmstop = 0.2 with effect size ms = 15 and fitness weights λcov = 2, λpos = 1.5, and λneg = 1, to focus on exploration. With a crossover probability of 0.25, we mostly rely on mutations. For safety testing, we use 10 repetitions and a test length of l = 40 and for performance testing we use ntest = nep = 10 and step width w = 20.