Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Multimodal Transformer Networks for Pedestrian Trajectory Prediction
Authors: Ziyi Yin, Ruijin Liu, Zhiliang Xiong, Zejian Yuan
IJCAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our multimodal transformer is validated on the PIE and JAAD datasets and achieves the state-of-the-art performance with the most lightweight model size. |
| Researcher Affiliation | Collaboration | Ziyi Yin1 , Ruijin Liu1 , Zhiliang Xiong2 , Zejian Yuan1 1Institute of Arti๏ฌcial Intelligence and Robotics, Xi an Jiaotong University, China 2Shenzhen Forward Innovation Digital Technology Co. Ltd, China |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The codes are available at https://github.com/ericyinyzy/MTNtrajectory. |
| Open Datasets | Yes | We evaluate MTN on Pedestrian Intention Estimation (PIE) [Rasouli et al., 2019] and Joint Attention in Autonomous Driving (JAAD) [Rasouli et al., 2017] datasets. |
| Dataset Splits | No | The paper mentions "train/test splits" but does not explicitly provide details or percentages for a validation split. |
| Hardware Specification | Yes | All experiments are conducted on a single GTX 2080Ti. |
| Software Dependencies | No | The paper mentions using specific tools like RAFT [Teed and Deng, 2020] and the Adam optimizer [Kingma and Ba, 2015], but does not provide specific version numbers for these or other general software dependencies like programming languages or deep learning frameworks. |
| Experiment Setup | Yes | The height Hego and width Wego of the center ROI are set to be 160 pixels, and the number of patches M and P are 64 and 9. Each patch owns the same area. The length of observation sequence T is set to be 15 frames (0.5s) and the length of prediction sequence N is 45 frames (1.5s). The number of total training epoch is 80, and ten epochs are used to warm up parts of the MTN as Sec. 2.3 states. The number of batch size is 128 and the Adam optimizer [Kingma and Ba, 2015] is used. |