NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving Scenario
Authors: Tianwen Qian, Jingjing Chen, Linhai Zhuo, Yang Jiao, Yu-Gang Jiang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments highlight the challenges posed by this new task. ... We establish several baseline models and extensively evaluate the performance of existing techniques for this task. Additionally, we conduct ablation experiments to analyze specific techniques that are relevant to this task, which provide a foundation for future research. |
| Researcher Affiliation | Academia | 1Academy for Engineering and Technology, Fudan University 2Shanghai Key Lab of Intelligent Information Processing, School of Computer Science, Fudan University |
| Pseudocode | No | The paper states 'Algorithm details can be found in supplementary materials.' but does not present any pseudocode or algorithm blocks within the main body of the paper. |
| Open Source Code | Yes | Codes and dataset are available at https://github.com/qiantianwen/Nu Scenes-QA. |
| Open Datasets | Yes | Codes and dataset are available at https://github.com/qiantianwen/Nu Scenes-QA. ... The proposed Nu Scenes-QA is built upon nu Scenes (Caesar et al. 2020), which is a popular 3D perception dataset for autonomous driving. |
| Dataset Splits | No | Nu Scenes-QA provides 459,941 question-answer pairs across 34,149 visual scenes, with 376,604 questions from 28,130 scenes for training, and 83,337 questions from 6,019 scenes for testing. This text only explicitly mentions training and testing splits for their dataset, not a separate validation split for their experiments. |
| Hardware Specification | Yes | All experiments are conducted with a batch size of 256 on 2 NVIDIA Ge Force RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions various models and optimizers like 'pre-trained Glo Ve', 'bi LSTM', 'Res Net', 'FPN', 'MCAN', 'BUTD', and 'Adam optimizer', but it does not specify any version numbers for these software components or libraries. |
| Experiment Setup | Yes | The dimension of the QA model dm is set to 512, and MCAN adopts a 6-layer encoder-decoder version. As for training, we used the Adam optimizer with an initial learning rate of 1e-4 and half decaying every 2 epochs. All experiments are conducted with a batch size of 256 on 2 NVIDIA Ge Force RTX 3090 GPUs. |