Modelling Diverse Interactions and Multimodality for Pedestrian Trajectory Prediction

Ruiping Wang; Zhijian Hu; Junzhi Yu; Jun Cheng

doi:10.1109/JAS.2025.125363

Volume 12 Issue 9

Sep. 2025

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2025 > 12(9): 1801-1813

R. Wang, Z. Hu, J. Yu, and J. Cheng, “Modelling diverse interactions and multimodality for pedestrian trajectory prediction,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 9, pp. 1801–1813, Sept. 2025. doi: 10.1109/JAS.2025.125363

Citation:

R. Wang, Z. Hu, J. Yu, and J. Cheng, “Modelling diverse interactions and multimodality for pedestrian trajectory prediction,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 9, pp. 1801–1813, Sept. 2025. doi: 10.1109/JAS.2025.125363

Citation:

PDF( 6234 KB)

Modelling Diverse Interactions and Multimodality for Pedestrian Trajectory Prediction

doi: 10.1109/JAS.2025.125363

More Information

Author Bio:
Ruiping Wang received the B.E. degree in electrical engineering from China University of Mining and Technology in 2013, the M.E. degree in control engineering from Harbin Engineering University in 2017, and the Ph.D. degree in control science and engineering from Beihang University in 2023. He is currently an Associate Professor at the Hangzhou International Innovation Institute, Beihang University. His research interests include intelligent control, deep learning, and the applications in autonomous driving

Zhijian Hu (Member, IEEE) received the Ph.D. degree in control science and engineering from Harbin Institute of Technology in 2022. He is currently a Research Fellow in the LAAS-CNRS, University of Toulouse, CNRS, France. His research interests include model predictive control, reinforcement learning and the applications in power systems and intelligent transportations

Junzhi Yu (Fellow, IEEE) received the B.E. degree in safety engineering and the M.E. degree in precision instruments and mechanology from the North University of China in 1998 and 2001, respectively, and the Ph.D. degree in control theory and control engineering from the Institute of Automation, Chinese Academy of Sciences in 2003. From 2004 to 2006, he was a Postdoctoral Research Fellow with the Center for Systems and Control, Peking University. He was an Associate Professor with the Institute of Automation, Chinese Academy of Sciences in 2006, where he became a Full Professor in 2012. In 2018, he joined the College of Engineering, Peking University, as a Tenured Full Professor. His current research interests include intelligent robots, motion control, and intelligent mechatronic systems

Jun Cheng received the Ph.D. degree in the School of Mechanical Engineering & Automation from Beihang University in 2023. He is currently a Research Fellow in the College of Engineering, Peking University. His research interests include dynamics modeling, parameter identification and control methods for industrial robot multi-body systems
Corresponding author: Ruiping Wang, e-mail: wangruiping@buaa.edu.cn; Junzhi Yu, e-mail: junzhi.yu@ia.ac.cn
Received Date: 2025-01-13
Accepted Date: 2025-02-21

Abstract

Abstract

Pedestrian trajectory prediction can significantly enhance the perception and decision-making capabilities of autonomous driving systems and intelligent surveillance systems based on camera sensors by predicting the states and behavior intentions of surrounding pedestrians. However, existing trajectory prediction methods remain failing to effectively model the diverse and complex interactions in the real world, including pedestrian-pedestrian interactions and pedestrian-environment interactions. Besides, these methods are not effective in capturing and characterizing the multimodal property of future trajectories. To address these challenges above, we propose to devise a hand-designed graph convolution and spatial cross attention to dynamically capture the diverse spatial interactions between pedestrians. To effectively explore the impact of scenarios on pedestrian trajectory, we build a pedestrian map, which can reflect the scene constraints and pedestrian motion preferences. Meanwhile, we construct a trajectory multimodality-aware module to capture the different potential mode implicit in diverse social behaviors for pedestrian future trajectory uncertainty. Finally, we compared the proposed method with trajectory prediction baselines on commonly used public pedestrian benchmarks, demonstrating the superior performance of our approach.
- Complicated interactions,
- pedestrian map,
- trajectory multimodality,
- trajectory prediction

FullText(HTML)

References(58)

References

[1]	Z. Wei, H. Zhao, Z. Li, X. Bu, Y. Chen, X. Zhang, Y. Lv, and F. Y. Wang, “STGSA: A novel spatial-temporal graph synchronous aggregation model for traffic prediction,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 1, pp. 226–238, Jan. 2023. doi: 10.1109/JAS.2023.123033
[2]	X. Li, Y. Liu, K. Wang, and F. Y. Wang, “A recurrent attention and interaction model for pedestrian trajectory prediction,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 5, pp. 1361–1370, Sep. 2020.
[3]	X. Zhao, Y. Chen, J. Guo, and D. Zhao, “A spatial-temporal attention model for human trajectory prediction,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 965–974, Jul. 2020. doi: 10.1109/JAS.2020.1003228
[4]	Y. Zheng, Q. Li, C. Wang, X. Wang, and L. Hu, “Multi-source adaptive selection and fusion for pedestrian dead reckoning,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 12, pp. 2174–2185, Dec. 2022.
[5]	A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, F. F. Li, and S. Savarese, “Social LSTM: Human trajectory prediction in crowded spaces,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 961–971.
[6]	A. Gupta, J. Johnson, F. F. Li, S. Savarese, and A. Alahi, “Social GAN: Socially acceptable trajectories with generative adversarial networks,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 2255–2264.
[7]	J. Amirian, J. B. Hayet, and J. Pettré, “Social ways: Learning multi-modal distributions of pedestrian trajectories with GANS,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops, Long Beach, USA, 2019, pp. 2964–2972.
[8]	V. Kosaraju, A. Sadeghian, R. Martín-Martín, I. Reid, S. Rezatofighi, and S. Savarese, “Social-BIGAT: Multimodal trajectory forecasting using bicycle-GAN and graph attention networks,” in Proc. 33rd Int. Conf. Neural Information Processing Systems, 2019, p. 13.
[9]	H. Manh and G. Alaghband, “Scene-LSTM: A model for human trajectory prediction,” arXiv preprint arXiv: 1808.04018, 2018.
[10]	H. Xue, D. Q. Huynh, and M. Reynolds, “SS-LSTM: A hierarchical LSTM model for pedestrian trajectory prediction,” in Proc. IEEE Winter Conf. Applications of Computer Vision, Lake Tahoe, USA, 2018, pp. 1186–1194.
[11]	M. Meng, Z. Wu, T. Chen, X. Cai, X. S. Zhou, F. Yang, and D. Shen, “Forecasting human trajectory from scene history,” in Proc. 36th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2022, pp. 1807.
[12]	A. Sadeghian, V. Kosaraju, A. Sadeghian, N. Hirose, H. Rezatofighi, and S. Savarese, “SoPhie: An attentive GAN for predicting paths compliant to social and physical constraints,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 1349–1358.
[13]	A. Mohamed, D. Zhu, W. Vu, M. Elhoseiny, and C. Claudel, “Social-implicit: Rethinking trajectory prediction evaluation and the effectiveness of implicit maximum likelihood estimation,” in Proc. 17th European Conf. Computer Vision, Tel Aviv, Israel, 2022, pp. 463–479.
[14]	J. Liang, L. Jiang, J. C. Niebles, A. G. Hauptmann, and F. F. Li, “Peeking into the future: Predicting future person activities and locations in videos,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 5718–5727.
[15]	C. Yu, X. Ma, J. Ren, H. Zhao, and S. Yi, “Spatio-temporal graph transformer networks for pedestrian trajectory prediction,” in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 507–523.
[16]	F. Giuliari, I. Hasan, M. Cristani, and F. Galasso, “Transformer networks for trajectory forecasting,” in Proc. 25th Int. Conf. Pattern Recognition, Milan, Italy, 2020, pp. 10335–10342.
[17]	L. L. Li, B. Yang, M. Liang, W. Zeng, M. Ren, S. Segal, and R. Urtasun, “End-to-end contextual perception and prediction with interaction transformer,” Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Las Vegas, USA, 2008, pp. 5784–5791.
[18]	A. Mohamed, K. Qian, M. Elhoseiny, and C. Claudel, “Social-STGCNN: A social spatio-temporal graph convolutional neural network for human trajectory prediction,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 14412–14420.
[19]	L. Shi, L. Wang, C. Long, S. Zhou, M. Zhou, Z. Niu, and G. Hua, “SGCN: Sparse graph convolution network for pedestrian trajectory prediction,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, USA, 2021, pp. 8990–8999.
[20]	R. Wang, Z. Hu, X. Song, and W. Li, “Trajectory distribution aware graph convolutional network for trajectory prediction considering spatio-temporal interactions and scene information,” IEEE Trans. Knowl. Data Eng., vol. 36, no. 8, pp. 4304–4316, Aug. 2024. doi: 10.1109/TKDE.2023.3329676
[21]	R. Wang, X. Song, Z. Hu, and Y. Cui, “Spatio-temporal interaction aware and trajectory distribution aware graph convolution network for pedestrian multimodal trajectory prediction,” IEEE Trans. Instrum. Meas., vol. 72, p. 5001211, 2023.
[22]	N. Nikhil and B. T. Morris, “Convolutional neural network for trajectory prediction,” in Proc. European Conf. Computer Vision, Munich, Germany, 2018, pp. 186–196.
[23]	H. Cui, V. Radosavljevic, F. C. Chou, T. H. Lin, T. Nguyen, T. K. Huang, J. Schneider, and N. Djuric, “Multimodal trajectory predictions for autonomous driving using deep convolutional networks,” in Proc. Int. Conf. Robotics and Automation, Montreal, Canada, 2019, pp. 2090–2096.
[24]	J. Duan, L. Wang, C. Long, S. Zhou, F. Zheng, L. Shi, and G. Hua, “Complementary attention gated network for pedestrian trajectory prediction,” in Proc. 36th AAAI Conf. Artificial Intelligence, 2022, pp. 542–550.
[25]	L. Shi, L. Wang, C. Long, S. Zhou, F. Zheng, N. Zheng, and G. Hua, “Social interpretable tree for pedestrian trajectory prediction,” in Proc. 36th AAAI Conf. Artificial Intelligence, 2022, pp. 2235–-2243.
[26]	C. Ge, S. Song, and G. Huang, “Causal intervention for human trajectory prediction with cross attention mechanism,” in Proc. 37th AAAI Conf. Artificial Intelligence, Washington, USA, 2023, pp. 658–666.
[27]	I. Bae and H. G. Jeon, “A set of control points conditioned pedestrian trajectory prediction,” in Proc. 37th AAAI Conf. Artificial Intelligence, Washington, USA, 2023, pp. 6155–6165.
[28]	T. Maeda and N. Ukita, “Fast inference and update of probabilistic density estimation on trajectory prediction,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Paris, France, 2023, pp. 9761–9771.
[29]	K. Chen, X. Song, and X. Ren, “Pedestrian trajectory prediction in heterogeneous traffic using pose keypoints-based convolutional encoder-decoder network,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 5, pp. 1764–1775, May 2021. doi: 10.1109/TCSVT.2020.3013254
[30]	H. Tran, V. Le, and T. Tran, “Goal-driven long-term trajectory prediction,” in Proc. IEEE/CVF Winter Conf. Applications of Computer Vision, Waikoloa, USA, 2021, pp. 796–805.
[31]	Y. Hu, S. Chen, Y. Zhang, X. Gu, “Collaborative motion prediction via neural motion message passing,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 6318–6327.
[32]	J. Sun, Q. Jiang, and C. Lu, “Recursive social behavior graph for trajectory prediction,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, W USA, 2020, pp. 657–666.
[33]	P. Negri and D. Garayalde, “Concatenating multiple trajectories using kalman filter for pedestrian tracking,” in Proc. IEEE Biennial Congr. Argentina, Bariloche, Argentina, 2014, pp. 364–369.
[34]	R. Toledo-Moreo and M. A. Zamora-Izquierdo, “IMM-based lane-change prediction in highways with low-cost GPS/INS,” IEEE Trans. Intell. Transp. Syst, vol. 10, no. 1, pp. 180–185, Mar. 2009. doi: 10.1109/TITS.2008.2011691
[35]	J. Wiest, M. Höffken, U. Kreßel, and K. Dietmayer, “Probabilistic trajectory prediction with Gaussian mixture models,” in Proc. IEEE Intelligent Vehicles Symp., Madrid, Spain, 2012, pp. 141–146.
[36]	C. F. Wakim, S. Capperon, and J. Oksman, “A Markovian model of pedestrian behavior,” in Proc. IEEE Int. Conf. Systems, Man and Cybernetics, The Hague, Netherlands, 2004, pp. 4028–4033.
[37]	P. Zhang, W. Ouyang, P. Zhang, J. Xue, and N. Zheng, “SR-LSTM: State refinement for LSTM towards pedestrian trajectory prediction,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 12077–12086.
[38]	Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, Q. V. Le, “XLNet: Generalized autoregressive pretraining for language understanding,” in Proc. 33rd Int. Conf. Neural Information Processing Systems, 2019, pp. 517.
[39]	Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, R. Soricut, “ALBERT: A lite BERT for self-supervised learning of language representations,” in Proc. 8th Int. Conf. Learning Representations, Addis Ababa, Ethiopia, 2020.
[40]	A. Bhattacharyya, B. Schiele, and M. Fritz, “Accurate and diverse sampling of sequences based on a “best of many” sample objective,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 8485–8493.
[41]	T. Gu, G. Chen, J. Li, C. Lin, Y. Rao, J. Zhou, and J. Lu, “Stochastic trajectory prediction via motion indeterminacy diffusion,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, USA, 2022, pp. 17092–17101.
[42]	I. Bae, J. H. Park, and H. G. Jeon, “Non-probability sampling network for stochastic human trajectory prediction,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, USA, 2022, pp. 6467–6477.
[43]	S. Pellegrini, A. Ess, K. Schindler, and L. van Gool, “You'll never walk alone: Modeling social behavior for multi-target tracking,” in Proc. IEEE 12th Int. Conf. Computer Vision, Kyoto, Japan, 2009, pp. 261–268.
[44]	A. Lerner, Y. Chrysanthou, and D. Lischinski, “Crowds by example,” in Comput. Graph. Forum, vol. 26, no. 3, pp. 655–664, Sep. 2007.
[45]	A. Robicquet, A. Sadeghian, A. Alahi, S. Savarese, “Learning social etiquette: Human trajectory understanding in crowded scenes,” in Proc. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 549–565.
[46]	K. Linou, “NBA player movements,” 2016. [Online]. Available: https://github.com/linouk23/NBA-Player-Movements. Accessed on: Augest 6, 2024.
[47]	N. Lee, W. Choi, P. Vernaza, C. B. Choy, P. H. S. Torr, and M. Chandraker, “DESIRE: Distant future prediction in dynamic scenes with interacting agents,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 2165–2174.
[48]	P. Zhang, J. Xue, P. Zhang, N. Zheng, and W. Ouyang, “Social-aware pedestrian trajectory prediction via states refinement LSTM,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 5, pp. 2742–2759, May 2022.
[49]	C. Xu, M. Li, Z. Ni, Y. Zhang, and S. Chen, “GroupNet: Multiscale hypergraph neural networks for trajectory prediction with relational reasoning,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, USA, 2022, pp. 6488–6497.
[50]	T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data,” in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 683–700.
[51]	J. Li, F. Yang, M. Tomizuka, and C. Choi, “EvolveGraph: Multi-agent trajectory prediction with dynamic relational reasoning,” in Proc. 34th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, p. 1660.
[52]	K. Mangalam, H. Girase, S. Agarwal, K. H. Lee, E. Adeli, J. Malik, and A. Gaidon, “It is not the journey but the destination: Endpoint conditioned trajectory prediction,” in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 759–776.
[53]	Y. Huang, H. Bi, Z. Li, T. Mao, and Z. Wang, “STGAT: Modeling spatial-temporal interactions for human trajectory prediction,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Seoul, Korea (South), 2019, pp. 6271–6280.
[54]	T. Kipf, E. Fetaya, K. C. Wang, M. Welling, and R. Zemel, “Neural relational inference for interacting systems,” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 2688–2697.
[55]	C. Xu, W. Mao, W. Zhang, and S. Chen, “Remember intentions: Retrospective-memory-based trajectory prediction,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, USA, 2022, pp. 6478–6487.
[56]	M. Geisslinger, F. Poszler, and M. Lienkamp, “An ethical trajectory planning algorithm for autonomous vehicles,” Nat. Mach. Intell., vol. 5, no. 2, pp. 137–144, Feb. 2023. doi: 10.1038/s42256-022-00607-z
[57]	W. Wei and J. Wang, “Ethical decision-making for autonomous driving based on LSTM trajectory prediction network,” Procedia Comput. Sci., vol. 226, pp. 134–140, Aug. 2023.
[58]	M. Althoff, M. Koschi, and S. Manzinger, “CommonRoad: Composable benchmarks for motion planning on roads,” Proc. IEEE Intelligent Vehicles Symp., Los Angeles, USA, 2017, pp. 719–726.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(6) / Tables(8)

Get Citation

PDF

XML

Article Metrics

Article views (31) PDF downloads(1)

Modelling Diverse Interactions and Multimodality for Pedestrian Trajectory Prediction

doi: 10.1109/JAS.2025.125363

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content