Multimodal Transformer Networks for Pedestrian Trajectory Prediction

Authors: Ziyi Yin, Ruijin Liu, Zhiliang Xiong, Zejian Yuan

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our multimodal transformer is validated on the PIE and JAAD datasets and achieves the state-of-the-art performance with the most lightweight model size.
Researcher Affiliation Collaboration Ziyi Yin1 , Ruijin Liu1 , Zhiliang Xiong2 , Zejian Yuan1 1Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University, China 2Shenzhen Forward Innovation Digital Technology Co. Ltd, China
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes The codes are available at https://github.com/ericyinyzy/MTNtrajectory.
Open Datasets Yes We evaluate MTN on Pedestrian Intention Estimation (PIE) [Rasouli et al., 2019] and Joint Attention in Autonomous Driving (JAAD) [Rasouli et al., 2017] datasets.
Dataset Splits No The paper mentions "train/test splits" but does not explicitly provide details or percentages for a validation split.
Hardware Specification Yes All experiments are conducted on a single GTX 2080Ti.
Software Dependencies No The paper mentions using specific tools like RAFT [Teed and Deng, 2020] and the Adam optimizer [Kingma and Ba, 2015], but does not provide specific version numbers for these or other general software dependencies like programming languages or deep learning frameworks.
Experiment Setup Yes The height Hego and width Wego of the center ROI are set to be 160 pixels, and the number of patches M and P are 64 and 9. Each patch owns the same area. The length of observation sequence T is set to be 15 frames (0.5s) and the length of prediction sequence N is 45 frames (1.5s). The number of total training epoch is 80, and ten epochs are used to warm up parts of the MTN as Sec. 2.3 states. The number of batch size is 128 and the Adam optimizer [Kingma and Ba, 2015] is used.