Neural Bridge Sampling for Evaluating Safety-Critical Autonomous Systems

Authors: Aman Sinha, Matthew O'Kelly, Russ Tedrake, John C. Duchi

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we demonstrate the efficacy of our approach on a variety of scenarios, illustrating its usefulness as a tool for rapid sensitivity analysis and model comparison that are essential to developing and testing safety-critical autonomous systems. A major focus of this work is empirical, and accordingly, Section 4 empirically demonstrates the superiority of neural bridge sampling over competing techniques in a variety of applications.
Researcher Affiliation Academia Aman Sinha Stanford University amans@stanford.edu Matthew O Kelly University of Pennsylvania mokelly@seas.upenn.edu Russ Tedrake Massachusetts Institute of Technology russt@mit.edu John Duchi Stanford University jduchi@stanford.edu
Pseudocode Yes Algorithm 1 Neural bridge sampling
Open Source Code No No explicit statement or link found regarding the release of open-source code for the described methodology.
Open Datasets Yes We evaluate a formally-verified neural network controller [48] on the Open AI Gym continuous Mountain Car environment [67, 17] under a domain perturbation. ... comparing two algorithms on the Open AI Gym Car Racing environment (which requires a surrogate model for gradients) [55].
Dataset Splits No The paper discusses simulation environments and probability distributions for defining scenarios, but does not provide specific training/validation/test dataset splits with percentages or sample counts.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts) are provided for running the experiments.
Software Dependencies No The paper mentions software components like 'Open AI Gym' and 'masked autoregressive flows (MAFs)' but does not provide specific version numbers for any software dependencies.
Experiment Setup No All methods are given the same computational budget as measured by evaluations of the simulator. This varies from 50,000-100,000 queries to run Algorithm 1 as determined by pγ (see Appendix C for details of each experiment s hyperparameters).