Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Reasoning Multi-Agent Behavioral Topology for Interactive Autonomous Driving

Authors: Haochen Liu, Li Chen, Yu Qiao, Chen Lv, Hongyang Li

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive verification on large-scale real-world datasets, including nu Plan and WOMD, demonstrates that Be Top achieves state-of-the-art performance in both prediction and planning tasks.
Researcher Affiliation	Collaboration	Haochen Liu1,2 Li Chen2,3 Yu Qiao2 Chen Lv1 Hongyang Li2,3 1 Nanyang Technological University 2 Shanghai AI Lab 3 University of Hong Kong
Pseudocode	No	The paper describes the model architecture and processes using diagrams and textual descriptions, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code and model is available at https://github.com/Open Drive Lab/Be Top.
Open Datasets	Yes	Extensive verification on large-scale real-world datasets, including nu Plan and WOMD, demonstrates that Be Top achieves state-of-the-art performance in both prediction and planning tasks. Data for nu Plan [80] and WOMD [35] are complied with CC-BY-NC 4.0 licence and Apache License 2.0;
Dataset Splits	Yes	For planning tasks in nu Plan, there are in total 1M training cases with 8s horizons. 8,300 separated testing set are chosen by Test14-Hard and Test14-Random benchmarks [73] for hard-core and general driving scenes. With further demands verifying maneuvers under interactive cases, we build the Test14-Inter benchmark filtering 1,340 scenes by testing set. The motion prediction tasks in WOMD share 487k training scenarios, with 44k validation and 44k testing set separately partitioned under two challenges:
Hardware Specification	Yes	Be Top Net for both prediction and planning tasks are trained in end-to-end manners by Adam W optimizer with 4 NVIDIA A100 GPUs.
Software Dependencies	No	Be Top Net for both prediction and planning tasks are trained in end-to-end manners by Adam W optimizer with 4 NVIDIA A100 GPUs.
Experiment Setup	Yes	The learning rate is configured as 1e 4 scheduled with the multi-step reduction strategy. The planning model is trained by 25 epochs with a batch size of 128, while the prediction task is trained with 30 epochs with a batch of 256.