Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EfficientPIE: Real-Time Prediction on Pedestrian Crossing Intention with Sole Observation

Authors: Fang Qu, Pengzhan Zhou, Yuepeng He, Kaixin Gao, Youyu Luo, Xin Feng, Yu Liu, Songtao Guo

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct sufficient experiments on PIE and JAAD. The results demonstrate that Efficient PIE outperforms state-of-the-art models and strikes a good trade-off between efficiency and accuracy, which is crucial in realistic autonomous driving system.
Researcher Affiliation Academia Fang Qu1 , Pengzhan Zhou1 , Yuepeng He1 , Kaixin Gao1 , Youyu Luo1 , Xin Feng2 , Yu Liu3 and Songtao Guo1 1College of Computer Science, Chongqing University 2School of Computer Science and Engineering, Chongqing University of Technology 3Department of Computing, Hong Kong Polytechnic University EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Progressive Perturbation 1: Input: Initialize the perturbation level m, the image X and label Y , number of total training epochs E 2: Output: Parameters of trained model 3: for j = 0 to E 1 do 4: Train the model and compute prediction ˆY 5: Compute the perturbation δ using Eqns. 16 and 17 6: Alter the prediction to obtain ˆZ using Eq. 15 7: Compute the loss L based on the altered ˆZ and Y 8: Back propagation and update the network parameters 9: end for
Open Source Code Yes Our code is available at https://github.com/heinideyibadiaole/Efficient PIE.
Open Datasets Yes To validate our method, we use the Pedestrian Intention Estimation (PIE) and Joint Attention in Autonomous Driving (JAAD) as the main dataset for experiment. JAAD is a large dataset for pedestrian crossing prediction, which is composed of recorded video clips. It has 3955 training sequences while the insufficient positive samples prevents models from learning representation of crossing intention. Attributed by the weakness, PIE is proposed to provide a more balanced benchmark [Rasouli et al., 2019] and officially defines the intention, which is the potential goal of pedestrians.
Dataset Splits No JAAD has 3955 training sequences while the insufficient positive samples prevents models from learning representation of crossing intention. Attributed by the weakness, PIE is proposed to provide a more balanced benchmark [Rasouli et al., 2019] and officially defines the intention, which is the potential goal of pedestrians. Compared to JAAD, PIE are generated from longer and continuous videos and focus more on the pedestrian samples that are likely to cross the road. Specifically, the positive samples of PIE are more sufficient than JAAD, which is beneficial to the model to capture the semantic pattern of crossing event. Moreover, PIE provides the pedestrian intention label while JAAD uses the crossing action label as the substitute. Both datasets provide bounding box annotations for each concerned pedestrian. The pedestrian tracks generated in the same way as [Rasouli et al., 2019] and the tracks are clipped with an overlap ratio of 0.5. After the clipping, JAAD has 40046 samples and PIE has 19086 samples, which are all taken before the happening of crossing event. To compute the intention more efficiently, we choose the last frame of samples and crop the image to 300 300 around the labelled pedestrian as input.
Hardware Specification Yes Efficient PIE is trained and evaluated with an NVIDIA RTX 3090 GPU.
Software Dependencies No RMSProp optimizer is used with weight decay 1e-4, set learning rate to 1e-5 and apply cosine annealing algorithm to decrease the learning rate, which are possible to contribute to the performance [He et al., 2019].
Experiment Setup Yes RMSProp optimizer is used with weight decay 1e-4, set learning rate to 1e-5 and apply cosine annealing algorithm to decrease the learning rate, which are possible to contribute to the performance [He et al., 2019]. The model is trained for 50 epochs and batch size is set to be 32. The training setting for two datasets is absolutely identical.