Understanding Human Behaviors in Crowds by Imitating the Decision-Making Process
Authors: Haosheng Zou, Hang Su, Shihong Song, Jun Zhu
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate the potential of our framework in disentangling the latent decision-making factors of pedestrians and stronger abilities in predicting future trajectories. |
| Researcher Affiliation | Academia | Dept. of Comp. Sci. & Tech., State Key Lab of Intell. Tech. & Sys., TNList Lab, CBICR Center Tsinghua University, Beijing, China |
| Pseudocode | Yes | Algorithm 1 SA-GAIL |
| Open Source Code | No | The paper does not provide a concrete statement about open-sourcing code for the methodology described, nor does it provide a specific repository link or mention code in supplementary materials. |
| Open Datasets | Yes | We conducted all experiments on the publicly available Central Station dataset (Zhou, Wang, and Tang 2011), which is a surveillance video of 33 minutes long with more than 40,000 keypoint tracklets. |
| Dataset Splits | Yes | We select the first 80% of the tracklets as the training set, then 10% as validation and the last 10% as test, and report the test error. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. It only implies the use of computing resources. |
| Software Dependencies | No | The paper mentions "TensorFlow enabling backpropagation" but does not provide specific version numbers for TensorFlow or any other software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | As per Sec. 2.1, we fix T1 = 9 and T2 = 8 in all our experiments. ... We sample all trajectories at a frame rate of 2 fps. The video is 720 pixels in width and 480 pixels in height. We normalize the two dimensions of coordinates respectively w.r.t. the size so that all coordinates lie within [0, 1]. We specify the basic network design as follows: we use an LSTM with 128 units for the encoder of the policy, and an LSTM with 128 units followed at each timestep by one fully-connected layer with 64 units and a final output fullyconnected layer with 2 units. The hidden fully-connected layer employs Re LU nonlinearity as suggested by (Radford, Metz, and Chintala 2015). The 2-dimensional output is treated as Gaussian mean with pre-specified logstd to parameterize a stochastic policy for TRPO. We adopt a similar architecture for the discriminator and posterior, where we use an LSTM with 128 units to process the whole sequence and add a fully-connected output layer to the last output of the LSTM. For the discriminator the output layer has only one sigmoid unit for the probability of the trajectory being real, and for the posterior a softmax distribution. We train SA-GAIL following the training procedure in (Ho and Ermon 2016; Li, Song, and Ermon 2017). |