reproducibilityindex.ai

NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving Scenario

Authors: Tianwen Qian, Jingjing Chen, Linhai Zhuo, Yang Jiao, Yu-Gang Jiang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments highlight the challenges posed by this new task. ... We establish several baseline models and extensively evaluate the performance of existing techniques for this task. Additionally, we conduct ablation experiments to analyze specific techniques that are relevant to this task, which provide a foundation for future research.
Researcher Affiliation	Academia	1Academy for Engineering and Technology, Fudan University 2Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University
Pseudocode	No	The paper states 'Algorithm details can be found in supplementary materials.' but does not present any pseudocode or algorithm blocks within the main body of the paper.
Open Source Code	Yes	Codes and dataset are available at https://github.com/qiantianwen/Nu Scenes-QA.
Open Datasets	Yes	Codes and dataset are available at https://github.com/qiantianwen/Nu Scenes-QA. ... The proposed Nu Scenes-QA is built upon nu Scenes (Caesar et al. 2020), which is a popular 3D perception dataset for autonomous driving.
Dataset Splits	No	Nu Scenes-QA provides 459,941 question-answer pairs across 34,149 visual scenes, with 376,604 questions from 28,130 scenes for training, and 83,337 questions from 6,019 scenes for testing. This text only explicitly mentions training and testing splits for their dataset, not a separate validation split for their experiments.
Hardware Specification	Yes	All experiments are conducted with a batch size of 256 on 2 NVIDIA Ge Force RTX 3090 GPUs.
Software Dependencies	No	The paper mentions various models and optimizers like 'pre-trained Glo Ve', 'bi LSTM', 'Res Net', 'FPN', 'MCAN', 'BUTD', and 'Adam optimizer', but it does not specify any version numbers for these software components or libraries.
Experiment Setup	Yes	The dimension of the QA model dm is set to 512, and MCAN adopts a 6-layer encoder-decoder version. As for training, we used the Adam optimizer with an initial learning rate of 1e-4 and half decaying every 2 epochs. All experiments are conducted with a batch size of 256 on 2 NVIDIA Ge Force RTX 3090 GPUs.