Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Adaptive Stress Testing: Finding Likely Failure Events with Reinforcement Learning
Authors: Ritchie Lee, Ole J. Mengshoel, Anshu Saksena, Ryan W. Gardner, Daniel Genin, Joshua Silbermann, Michael Owen, Mykel J. Kochenderfer
JAIR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of the approach on an aircraft collision avoidance application, where a prototype aircraft collision avoidance system is stress tested to find the most likely scenarios of near mid-air collision. [...] Section 5 presents the results of analyzing near mid-air collisions in an aircraft collision avoidance system. [...] 5.7 Performance Comparison with Direct Monte Carlo Simulation |
| Researcher Affiliation | Collaboration | Ritchie Lee EMAIL NASA Ames Research Center, Moffett Field, CA 94035 Ole J. Mengshoel EMAIL Norwegian University of Science and Technology NO-7491, Trondheim, Norway Anshu Saksena EMAIL Ryan W. Gardner EMAIL Daniel Genin EMAIL Joshua Silbermann EMAIL Johns Hopkins University Applied Physics Laboratory 11100 Johns Hopkins Rd., Baltimore, MD 20723 Michael Owen EMAIL MIT Lincoln Laboratory, 244 Wood St., Lexington, MA 02421 Mykel J. Kochenderfer EMAIL Stanford University, 496 Lomita Mall, Stanford, CA, 94305 |
| Pseudocode | Yes | Algorithm 1 MCTS for seed-action simulators |
| Open Source Code | Yes | Our implementation of AST is available as an open source Julia package at https://github.com/sisl/Adaptive Stress Testing.jl. |
| Open Datasets | Yes | In our experiments, pairwise (two-aircraft) encounters are initialized using the Lincoln Laboratory Correlated Aircraft Encounter Model (LLCEM) (Kochenderfer et al., 2010, 2008). LLCEM is a statistical model learned from a large body of radar data of the entire national airspace. |
| Dataset Splits | No | The paper does not provide specific training/test/validation dataset splits. It describes scenarios or 'encounters' that are initialized from a model (LLCEM) or a star model, and then searched, but not pre-defined splits of a static dataset. |
| Hardware Specification | Yes | The experiments were performed on a laptop with an Intel i7 4700HQ quad-core processor and 32 GB of memory. |
| Software Dependencies | No | The paper states, "Our implementation of AST is available as an open source Julia package..." but does not specify version numbers for Julia or any other key software components, libraries, or solvers used in the implementation. |
| Experiment Setup | Yes | Table 3: Single-threat configuration [...] maximum steps 50 iterations 2000 exploration constant 100.0 k 0.5 α 0.85. Table 4: Multi-threat configuration [...] maximum steps 50 iterations 1000 exploration constant 100.0 k 0.5 α 0.85. |