Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Adaptive Stress Testing: Finding Likely Failure Events with Reinforcement Learning

Authors: Ritchie Lee, Ole J. Mengshoel, Anshu Saksena, Ryan W. Gardner, Daniel Genin, Joshua Silbermann, Michael Owen, Mykel J. Kochenderfer

JAIR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the eﬀectiveness of the approach on an aircraft collision avoidance application, where a prototype aircraft collision avoidance system is stress tested to ﬁnd the most likely scenarios of near mid-air collision. [...] Section 5 presents the results of analyzing near mid-air collisions in an aircraft collision avoidance system. [...] 5.7 Performance Comparison with Direct Monte Carlo Simulation
Researcher Affiliation	Collaboration	Ritchie Lee EMAIL NASA Ames Research Center, Moﬀett Field, CA 94035 Ole J. Mengshoel EMAIL Norwegian University of Science and Technology NO-7491, Trondheim, Norway Anshu Saksena EMAIL Ryan W. Gardner EMAIL Daniel Genin EMAIL Joshua Silbermann EMAIL Johns Hopkins University Applied Physics Laboratory 11100 Johns Hopkins Rd., Baltimore, MD 20723 Michael Owen EMAIL MIT Lincoln Laboratory, 244 Wood St., Lexington, MA 02421 Mykel J. Kochenderfer EMAIL Stanford University, 496 Lomita Mall, Stanford, CA, 94305
Pseudocode	Yes	Algorithm 1 MCTS for seed-action simulators
Open Source Code	Yes	Our implementation of AST is available as an open source Julia package at https://github.com/sisl/Adaptive Stress Testing.jl.
Open Datasets	Yes	In our experiments, pairwise (two-aircraft) encounters are initialized using the Lincoln Laboratory Correlated Aircraft Encounter Model (LLCEM) (Kochenderfer et al., 2010, 2008). LLCEM is a statistical model learned from a large body of radar data of the entire national airspace.
Dataset Splits	No	The paper does not provide specific training/test/validation dataset splits. It describes scenarios or 'encounters' that are initialized from a model (LLCEM) or a star model, and then searched, but not pre-defined splits of a static dataset.
Hardware Specification	Yes	The experiments were performed on a laptop with an Intel i7 4700HQ quad-core processor and 32 GB of memory.
Software Dependencies	No	The paper states, "Our implementation of AST is available as an open source Julia package..." but does not specify version numbers for Julia or any other key software components, libraries, or solvers used in the implementation.
Experiment Setup	Yes	Table 3: Single-threat conﬁguration [...] maximum steps 50 iterations 2000 exploration constant 100.0 k 0.5 α 0.85. Table 4: Multi-threat conﬁguration [...] maximum steps 50 iterations 1000 exploration constant 100.0 k 0.5 α 0.85.