Social-Transmotion: Promptable Human Trajectory Prediction

Authors: Saeed Saadatnejad, Yang Gao, Kaouther Messaoud, Alexandre Alahi

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY. Our experimental results demonstrate that Social-Transmotion outperforms previous models on several datasets.
Researcher Affiliation Academia Saeed Saadatnejad , Yang Gao , Kaouther Messaoud, Alexandre Alahi Visual Intelligence for Transportation (VITA) laboratory EPFL, Switzerland {firstname.lastname}@epfl.ch
Pseudocode No The paper includes architectural diagrams (Figure 1, Figure 2) but does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code is publicly available: https://github.com/vita-epfl/social-transmotion.
Open Datasets Yes We evaluate on three publicly available datasets providing visual cues: the JTA (Fabbri et al., 2018) and the JRDB (Martin-Martin et al., 2021) in the main text, and the Pedestrians and Cyclists in Road Traffic (Kress et al., 2022) in Appendix A.1. Furthermore, we report on the ETH-UCY dataset (Pellegrini et al., 2009; Lerner et al., 2007), that does not contain visual cues in Appendix A.2.
Dataset Splits Yes JTA dataset: a large-scale synthetic dataset containing 256 training sequences, 128 validation sequences, and 128 test sequences, with a total of approximately 10 million 3D keypoints annotations. For the JRDB dataset, after extracting data, we used the Traj Net++ (Kothari et al., 2021) code base to generate four types of trajectories with acceptance rates of 1.0, 1.0, 1.0, 1.0 . We used gates-ai-lab-2019-02-08 0 for validation, the indoor video packard-poster-session2019-03-20 1 and the outdoor video bytes-cafe-2019-02-07 0 , gates-basement-elevators-201901-17 1 , hewlett-packard-intersection-2019-01-24 0 , huang-lane-2019-02-12 0 , jordan-hall2019-04-22 0 , packard-poster-session-2019-03-20 2 , stlc-111-2019-04-19 0 , svl-meetinggates-2-2019-04-08 0 , svl-meeting-gates-2-2019-04-08 1 , and tressider-2019-03-16 1 for training.
Hardware Specification Yes All computations were performed on a NVIDIA V100 GPU equipped with 32GB of memory.
Software Dependencies No The paper mentions using the Adam optimizer and the Traj Net++ code base, but does not provide specific version numbers for these or other software dependencies like programming languages (e.g., Python) or deep learning frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup Yes The architecture of CMT includes six layers and four heads, whereas ST is constructed with three layers and four heads; both utilize a model dimension of 128. We employed the Adam optimizer (Kingma & Ba, 2014) with an initial learning rate of 1 10 4 , which was reduced by a factor of 0.1 after 80% of the 50 total epochs were completed. We had 30% modality-masking and 10% meta-masking.