reproducibilityindex.ai

Search-Based Testing of Reinforcement Learning

Authors: Martin Tappler, Filip Cano Córdoba, Bernhard K. Aichernig, Bettina Könighofer

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our testing framework on trained deep RL agents for the video game Super Mario Bros., trained with varying numbers of training episodes and different action sets.
Researcher Affiliation	Collaboration	1Institute of Software Technology, Graz University of Technology 2Institute of Applied Information Processing and Communications, Graz University of Technology 3TU Graz-SAL DES Lab,Silicon Austria Labs, Graz, Austria 4Lamarr Security Research
Pseudocode	Yes	Algorithm 1: Search for Reference Trace τref
Open Source Code	Yes	Details on the learning parameters along with more experiments and the source code are included in the technical appendix.
Open Datasets	Yes	We use a third-party RL agent that operates in the Open AI gym environment and uses double deep Q-Networks [Feng et al., 2020]. and Open AI Gym [Brockman et al., 2016]
Dataset Splits	No	The paper mentions 'training for 80k episodes' and 'testing' but does not specify explicit train/validation/test dataset splits, percentages, or sample counts for reproducibility. In an RL context, this would typically involve details on episode counts or interaction steps for each phase, which are not detailed as 'splits'.
Hardware Specification	No	The paper mentions 'a standard laptop' and 'a dedicated cluster' for training, and acknowledges 'HPC resources provided by the ZID of Graz University of Technology', but none of these provide specific hardware details such as GPU/CPU models, processor types, or memory amounts.
Software Dependencies	No	The paper mentions the use of 'Open AI gym' and implies 'PyTorch' through a citation to a PyTorch tutorial, but it does not provide specific version numbers for these or any other software dependencies required for replication.
Experiment Setup	Yes	We fuzzed for 50 generations with a population of 50, used a mutation stop-probability pmstop = 0.2 with effect size ms = 15 and ﬁtness weights λcov = 2, λpos = 1.5, and λneg = 1, to focus on exploration. With a crossover probability of 0.25, we mostly rely on mutations. For safety testing, we use 10 repetitions and a test length of l = 40 and for performance testing we use ntest = nep = 10 and step width w = 20.