A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 12 Issue 9
Sep.  2025

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 19.2, Top 1 (SCI Q1)
    CiteScore: 28.2, Top 1% (Q1)
    Google Scholar h5-index: 95, TOP 5
Turn off MathJax
Article Contents
R. Wang, Z. Hu, J. Yu, and J. Cheng, “Modelling diverse interactions and multimodality for pedestrian trajectory prediction,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 9, pp. 1801–1813, Sept. 2025. doi: 10.1109/JAS.2025.125363
Citation: R. Wang, Z. Hu, J. Yu, and J. Cheng, “Modelling diverse interactions and multimodality for pedestrian trajectory prediction,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 9, pp. 1801–1813, Sept. 2025. doi: 10.1109/JAS.2025.125363

Modelling Diverse Interactions and Multimodality for Pedestrian Trajectory Prediction

doi: 10.1109/JAS.2025.125363
More Information
  • Pedestrian trajectory prediction can significantly enhance the perception and decision-making capabilities of autonomous driving systems and intelligent surveillance systems based on camera sensors by predicting the states and behavior intentions of surrounding pedestrians. However, existing trajectory prediction methods remain failing to effectively model the diverse and complex interactions in the real world, including pedestrian-pedestrian interactions and pedestrian-environment interactions. Besides, these methods are not effective in capturing and characterizing the multimodal property of future trajectories. To address these challenges above, we propose to devise a hand-designed graph convolution and spatial cross attention to dynamically capture the diverse spatial interactions between pedestrians. To effectively explore the impact of scenarios on pedestrian trajectory, we build a pedestrian map, which can reflect the scene constraints and pedestrian motion preferences. Meanwhile, we construct a trajectory multimodality-aware module to capture the different potential mode implicit in diverse social behaviors for pedestrian future trajectory uncertainty. Finally, we compared the proposed method with trajectory prediction baselines on commonly used public pedestrian benchmarks, demonstrating the superior performance of our approach.

     

  • loading
  • [1]
    Z. Wei, H. Zhao, Z. Li, X. Bu, Y. Chen, X. Zhang, Y. Lv, and F. Y. Wang, “STGSA: A novel spatial-temporal graph synchronous aggregation model for traffic prediction,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 1, pp. 226–238, Jan. 2023. doi: 10.1109/JAS.2023.123033
    [2]
    X. Li, Y. Liu, K. Wang, and F. Y. Wang, “A recurrent attention and interaction model for pedestrian trajectory prediction,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 5, pp. 1361–1370, Sep. 2020.
    [3]
    X. Zhao, Y. Chen, J. Guo, and D. Zhao, “A spatial-temporal attention model for human trajectory prediction,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 965–974, Jul. 2020. doi: 10.1109/JAS.2020.1003228
    [4]
    Y. Zheng, Q. Li, C. Wang, X. Wang, and L. Hu, “Multi-source adaptive selection and fusion for pedestrian dead reckoning,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 12, pp. 2174–2185, Dec. 2022.
    [5]
    A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, F. F. Li, and S. Savarese, “Social LSTM: Human trajectory prediction in crowded spaces,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 961–971.
    [6]
    A. Gupta, J. Johnson, F. F. Li, S. Savarese, and A. Alahi, “Social GAN: Socially acceptable trajectories with generative adversarial networks,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 2255–2264.
    [7]
    J. Amirian, J. B. Hayet, and J. Pettré, “Social ways: Learning multi-modal distributions of pedestrian trajectories with GANS,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops, Long Beach, USA, 2019, pp. 2964–2972.
    [8]
    V. Kosaraju, A. Sadeghian, R. Martín-Martín, I. Reid, S. Rezatofighi, and S. Savarese, “Social-BIGAT: Multimodal trajectory forecasting using bicycle-GAN and graph attention networks,” in Proc. 33rd Int. Conf. Neural Information Processing Systems, 2019, p. 13.
    [9]
    H. Manh and G. Alaghband, “Scene-LSTM: A model for human trajectory prediction,” arXiv preprint arXiv: 1808.04018, 2018.
    [10]
    H. Xue, D. Q. Huynh, and M. Reynolds, “SS-LSTM: A hierarchical LSTM model for pedestrian trajectory prediction,” in Proc. IEEE Winter Conf. Applications of Computer Vision, Lake Tahoe, USA, 2018, pp. 1186–1194.
    [11]
    M. Meng, Z. Wu, T. Chen, X. Cai, X. S. Zhou, F. Yang, and D. Shen, “Forecasting human trajectory from scene history,” in Proc. 36th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2022, pp. 1807.
    [12]
    A. Sadeghian, V. Kosaraju, A. Sadeghian, N. Hirose, H. Rezatofighi, and S. Savarese, “SoPhie: An attentive GAN for predicting paths compliant to social and physical constraints,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 1349–1358.
    [13]
    A. Mohamed, D. Zhu, W. Vu, M. Elhoseiny, and C. Claudel, “Social-implicit: Rethinking trajectory prediction evaluation and the effectiveness of implicit maximum likelihood estimation,” in Proc. 17th European Conf. Computer Vision, Tel Aviv, Israel, 2022, pp. 463–479.
    [14]
    J. Liang, L. Jiang, J. C. Niebles, A. G. Hauptmann, and F. F. Li, “Peeking into the future: Predicting future person activities and locations in videos,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 5718–5727.
    [15]
    C. Yu, X. Ma, J. Ren, H. Zhao, and S. Yi, “Spatio-temporal graph transformer networks for pedestrian trajectory prediction,” in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 507–523.
    [16]
    F. Giuliari, I. Hasan, M. Cristani, and F. Galasso, “Transformer networks for trajectory forecasting,” in Proc. 25th Int. Conf. Pattern Recognition, Milan, Italy, 2020, pp. 10335–10342.
    [17]
    L. L. Li, B. Yang, M. Liang, W. Zeng, M. Ren, S. Segal, and R. Urtasun, “End-to-end contextual perception and prediction with interaction transformer,” Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Las Vegas, USA, 2008, pp. 5784–5791.
    [18]
    A. Mohamed, K. Qian, M. Elhoseiny, and C. Claudel, “Social-STGCNN: A social spatio-temporal graph convolutional neural network for human trajectory prediction,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 14412–14420.
    [19]
    L. Shi, L. Wang, C. Long, S. Zhou, M. Zhou, Z. Niu, and G. Hua, “SGCN: Sparse graph convolution network for pedestrian trajectory prediction,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, USA, 2021, pp. 8990–8999.
    [20]
    R. Wang, Z. Hu, X. Song, and W. Li, “Trajectory distribution aware graph convolutional network for trajectory prediction considering spatio-temporal interactions and scene information,” IEEE Trans. Knowl. Data Eng., vol. 36, no. 8, pp. 4304–4316, Aug. 2024. doi: 10.1109/TKDE.2023.3329676
    [21]
    R. Wang, X. Song, Z. Hu, and Y. Cui, “Spatio-temporal interaction aware and trajectory distribution aware graph convolution network for pedestrian multimodal trajectory prediction,” IEEE Trans. Instrum. Meas., vol. 72, p. 5001211, 2023.
    [22]
    N. Nikhil and B. T. Morris, “Convolutional neural network for trajectory prediction,” in Proc. European Conf. Computer Vision, Munich, Germany, 2018, pp. 186–196.
    [23]
    H. Cui, V. Radosavljevic, F. C. Chou, T. H. Lin, T. Nguyen, T. K. Huang, J. Schneider, and N. Djuric, “Multimodal trajectory predictions for autonomous driving using deep convolutional networks,” in Proc. Int. Conf. Robotics and Automation, Montreal, Canada, 2019, pp. 2090–2096.
    [24]
    J. Duan, L. Wang, C. Long, S. Zhou, F. Zheng, L. Shi, and G. Hua, “Complementary attention gated network for pedestrian trajectory prediction,” in Proc. 36th AAAI Conf. Artificial Intelligence, 2022, pp. 542–550.
    [25]
    L. Shi, L. Wang, C. Long, S. Zhou, F. Zheng, N. Zheng, and G. Hua, “Social interpretable tree for pedestrian trajectory prediction,” in Proc. 36th AAAI Conf. Artificial Intelligence, 2022, pp. 2235–-2243.
    [26]
    C. Ge, S. Song, and G. Huang, “Causal intervention for human trajectory prediction with cross attention mechanism,” in Proc. 37th AAAI Conf. Artificial Intelligence, Washington, USA, 2023, pp. 658–666.
    [27]
    I. Bae and H. G. Jeon, “A set of control points conditioned pedestrian trajectory prediction,” in Proc. 37th AAAI Conf. Artificial Intelligence, Washington, USA, 2023, pp. 6155–6165.
    [28]
    T. Maeda and N. Ukita, “Fast inference and update of probabilistic density estimation on trajectory prediction,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Paris, France, 2023, pp. 9761–9771.
    [29]
    K. Chen, X. Song, and X. Ren, “Pedestrian trajectory prediction in heterogeneous traffic using pose keypoints-based convolutional encoder-decoder network,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 5, pp. 1764–1775, May 2021. doi: 10.1109/TCSVT.2020.3013254
    [30]
    H. Tran, V. Le, and T. Tran, “Goal-driven long-term trajectory prediction,” in Proc. IEEE/CVF Winter Conf. Applications of Computer Vision, Waikoloa, USA, 2021, pp. 796–805.
    [31]
    Y. Hu, S. Chen, Y. Zhang, X. Gu, “Collaborative motion prediction via neural motion message passing,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 6318–6327.
    [32]
    J. Sun, Q. Jiang, and C. Lu, “Recursive social behavior graph for trajectory prediction,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, W USA, 2020, pp. 657–666.
    [33]
    P. Negri and D. Garayalde, “Concatenating multiple trajectories using kalman filter for pedestrian tracking,” in Proc. IEEE Biennial Congr. Argentina, Bariloche, Argentina, 2014, pp. 364–369.
    [34]
    R. Toledo-Moreo and M. A. Zamora-Izquierdo, “IMM-based lane-change prediction in highways with low-cost GPS/INS,” IEEE Trans. Intell. Transp. Syst, vol. 10, no. 1, pp. 180–185, Mar. 2009. doi: 10.1109/TITS.2008.2011691
    [35]
    J. Wiest, M. Höffken, U. Kreßel, and K. Dietmayer, “Probabilistic trajectory prediction with Gaussian mixture models,” in Proc. IEEE Intelligent Vehicles Symp., Madrid, Spain, 2012, pp. 141–146.
    [36]
    C. F. Wakim, S. Capperon, and J. Oksman, “A Markovian model of pedestrian behavior,” in Proc. IEEE Int. Conf. Systems, Man and Cybernetics, The Hague, Netherlands, 2004, pp. 4028–4033.
    [37]
    P. Zhang, W. Ouyang, P. Zhang, J. Xue, and N. Zheng, “SR-LSTM: State refinement for LSTM towards pedestrian trajectory prediction,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 12077–12086.
    [38]
    Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, Q. V. Le, “XLNet: Generalized autoregressive pretraining for language understanding,” in Proc. 33rd Int. Conf. Neural Information Processing Systems, 2019, pp. 517.
    [39]
    Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, R. Soricut, “ALBERT: A lite BERT for self-supervised learning of language representations,” in Proc. 8th Int. Conf. Learning Representations, Addis Ababa, Ethiopia, 2020.
    [40]
    A. Bhattacharyya, B. Schiele, and M. Fritz, “Accurate and diverse sampling of sequences based on a “best of many” sample objective,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 8485–8493.
    [41]
    T. Gu, G. Chen, J. Li, C. Lin, Y. Rao, J. Zhou, and J. Lu, “Stochastic trajectory prediction via motion indeterminacy diffusion,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, USA, 2022, pp. 17092–17101.
    [42]
    I. Bae, J. H. Park, and H. G. Jeon, “Non-probability sampling network for stochastic human trajectory prediction,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, USA, 2022, pp. 6467–6477.
    [43]
    S. Pellegrini, A. Ess, K. Schindler, and L. van Gool, “You'll never walk alone: Modeling social behavior for multi-target tracking,” in Proc. IEEE 12th Int. Conf. Computer Vision, Kyoto, Japan, 2009, pp. 261–268.
    [44]
    A. Lerner, Y. Chrysanthou, and D. Lischinski, “Crowds by example,” in Comput. Graph. Forum, vol. 26, no. 3, pp. 655–664, Sep. 2007.
    [45]
    A. Robicquet, A. Sadeghian, A. Alahi, S. Savarese, “Learning social etiquette: Human trajectory understanding in crowded scenes,” in Proc. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 549–565.
    [46]
    K. Linou, “NBA player movements,” 2016. [Online]. Available: https://github.com/linouk23/NBA-Player-Movements. Accessed on: Augest 6, 2024.
    [47]
    N. Lee, W. Choi, P. Vernaza, C. B. Choy, P. H. S. Torr, and M. Chandraker, “DESIRE: Distant future prediction in dynamic scenes with interacting agents,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 2165–2174.
    [48]
    P. Zhang, J. Xue, P. Zhang, N. Zheng, and W. Ouyang, “Social-aware pedestrian trajectory prediction via states refinement LSTM,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 5, pp. 2742–2759, May 2022.
    [49]
    C. Xu, M. Li, Z. Ni, Y. Zhang, and S. Chen, “GroupNet: Multiscale hypergraph neural networks for trajectory prediction with relational reasoning,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, USA, 2022, pp. 6488–6497.
    [50]
    T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data,” in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 683–700.
    [51]
    J. Li, F. Yang, M. Tomizuka, and C. Choi, “EvolveGraph: Multi-agent trajectory prediction with dynamic relational reasoning,” in Proc. 34th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, p. 1660.
    [52]
    K. Mangalam, H. Girase, S. Agarwal, K. H. Lee, E. Adeli, J. Malik, and A. Gaidon, “It is not the journey but the destination: Endpoint conditioned trajectory prediction,” in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 759–776.
    [53]
    Y. Huang, H. Bi, Z. Li, T. Mao, and Z. Wang, “STGAT: Modeling spatial-temporal interactions for human trajectory prediction,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Seoul, Korea (South), 2019, pp. 6271–6280.
    [54]
    T. Kipf, E. Fetaya, K. C. Wang, M. Welling, and R. Zemel, “Neural relational inference for interacting systems,” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 2688–2697.
    [55]
    C. Xu, W. Mao, W. Zhang, and S. Chen, “Remember intentions: Retrospective-memory-based trajectory prediction,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, USA, 2022, pp. 6478–6487.
    [56]
    M. Geisslinger, F. Poszler, and M. Lienkamp, “An ethical trajectory planning algorithm for autonomous vehicles,” Nat. Mach. Intell., vol. 5, no. 2, pp. 137–144, Feb. 2023. doi: 10.1038/s42256-022-00607-z
    [57]
    W. Wei and J. Wang, “Ethical decision-making for autonomous driving based on LSTM trajectory prediction network,” Procedia Comput. Sci., vol. 226, pp. 134–140, Aug. 2023.
    [58]
    M. Althoff, M. Koschi, and S. Manzinger, “CommonRoad: Composable benchmarks for motion planning on roads,” Proc. IEEE Intelligent Vehicles Symp., Los Angeles, USA, 2017, pp. 719–726.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)  / Tables(8)

    Article Metrics

    Article views (31) PDF downloads(1) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return