Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Adv-BMT: Bidirectional Motion Transformer for Safety-Critical Traffic Scenario Generation

Authors: Yuxin Liu, Zhenghao (Mark) Peng, Xuanhao Cui, Bolei Zhou

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results validate the quality of generated collision scenarios by Adv-BMT: training in our augmented dataset would reduce episode collision rates by 20%.
Researcher Affiliation	Academia	Yuxin Liu Zhenghao Peng Xuanhao Cui Bolei Zhou University of California, Los Angeles
Pseudocode	No	The paper describes the methodology in prose and mathematical formulations. There are no explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures present in the document.
Open Source Code	Yes	Demo and code are available at https://metadriverse.github.io/ adv-bmt/.
Open Datasets	Yes	All experiments use driving data from the Waymo Open Motion Dataset (WOMD) [4] with formats managed by Scenario Net [8].
Dataset Splits	Yes	The training set contains 500 real-world scenarios randomly selected from the WOMD training set. We assess policy performance across two distinct validation environments: (1) 100 Waymo validation environments, which consist of unmodified real-world driving scenarios from WOMD validation set, and (2) 100 Adv-BMT environments, which is the augmented collision scenarios from the 100 validation scenes.
Hardware Specification	Yes	During training, we use 8 NVIDIA RTX A6000 GPUs for our model training and fine-tunings.
Software Dependencies	No	The paper mentions tools and platforms like 'Meta Drive [9]' and 'Scenario Net [8]', but does not specify version numbers for any software dependencies, programming languages, or libraries used in the experiments.
Experiment Setup	Yes	Table 7: BMT Training settings. Forward Prediction Hyper-parameter Value Training steps 10E6 Batch sizes 2 Training Time (h) 185 Sampling Topp 0.95 Sampling temperature 1.0 Learning Rates 3E-4 Table 8: RL training settings. TD3 Hyper-parameter Value Discounted Factor 0.99 Train Batch Size 1024 Learning Rate 1E-4 Policy Delay 200 Target Network 0.005