Understanding Human Behaviors in Crowds by Imitating the Decision-Making Process

Authors: Haosheng Zou, Hang Su, Shihong Song, Jun Zhu

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate the potential of our framework in disentangling the latent decision-making factors of pedestrians and stronger abilities in predicting future trajectories.
Researcher Affiliation Academia Dept. of Comp. Sci. & Tech., State Key Lab of Intell. Tech. & Sys., TNList Lab, CBICR Center Tsinghua University, Beijing, China
Pseudocode Yes Algorithm 1 SA-GAIL
Open Source Code No The paper does not provide a concrete statement about open-sourcing code for the methodology described, nor does it provide a specific repository link or mention code in supplementary materials.
Open Datasets Yes We conducted all experiments on the publicly available Central Station dataset (Zhou, Wang, and Tang 2011), which is a surveillance video of 33 minutes long with more than 40,000 keypoint tracklets.
Dataset Splits Yes We select the first 80% of the tracklets as the training set, then 10% as validation and the last 10% as test, and report the test error.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only implies the use of computing resources.
Software Dependencies No The paper mentions "TensorFlow enabling backpropagation" but does not provide specific version numbers for TensorFlow or any other software dependencies needed to replicate the experiment.
Experiment Setup Yes As per Sec. 2.1, we fix T1 = 9 and T2 = 8 in all our experiments. ... We sample all trajectories at a frame rate of 2 fps. The video is 720 pixels in width and 480 pixels in height. We normalize the two dimensions of coordinates respectively w.r.t. the size so that all coordinates lie within [0, 1]. We specify the basic network design as follows: we use an LSTM with 128 units for the encoder of the policy, and an LSTM with 128 units followed at each timestep by one fully-connected layer with 64 units and a final output fullyconnected layer with 2 units. The hidden fully-connected layer employs Re LU nonlinearity as suggested by (Radford, Metz, and Chintala 2015). The 2-dimensional output is treated as Gaussian mean with pre-specified logstd to parameterize a stochastic policy for TRPO. We adopt a similar architecture for the discriminator and posterior, where we use an LSTM with 128 units to process the whole sequence and add a fully-connected output layer to the last output of the LSTM. For the discriminator the output layer has only one sigmoid unit for the probability of the trajectory being real, and for the posterior a softmax distribution. We train SA-GAIL following the training procedure in (Ho and Ermon 2016; Li, Song, and Ermon 2017).