Embodied Multi-Agent Systems: A Review

Zhuo Li; Weiran Wu; Yunlong Guo; Jian Sun; Qing-Long Han

doi:10.1109/JAS.2025.125552

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2025 > In Press, Accepted Manuscript

Z. Li, W. Wu, Y. Guo, J. Sun, and Q.-L. Han, “Embodied multi-agent systems: A review,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 6, pp. 1095–1116, Jun. 2025. doi: 10.1109/JAS.2025.125552

Citation:

Z. Li, W. Wu, Y. Guo, J. Sun, and Q.-L. Han, “Embodied multi-agent systems: A review,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 6, pp. 1095–1116, Jun. 2025. doi: 10.1109/JAS.2025.125552

Z. Li, W. Wu, Y. Guo, J. Sun, and Q.-L. Han, “Embodied multi-agent systems: A review,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 6, pp. 1095–1116, Jun. 2025. doi: 10.1109/JAS.2025.125552

Citation:

Z. Li, W. Wu, Y. Guo, J. Sun, and Q.-L. Han, “Embodied multi-agent systems: A review,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 6, pp. 1095–1116, Jun. 2025. doi: 10.1109/JAS.2025.125552

PDF( 18897 KB)

Embodied Multi-Agent Systems: A Review

doi: 10.1109/JAS.2025.125552

Funds: The work was supported in part by National Natural Science Foundation of China (62495095, 62088101)

More Information

Author Bio:
Zhuo Li received the B.Eng. degree from the Department of Automation, Harbin Institute of Technology, in 2016, and the Ph.D. degree from the Department of Automation, Tsinghua University, in 2022. She was a Postdoctoral Researcher from 2022 to 2024 at the School of Automation, Beijing Institute of Technology, Beijing, where she has been an Associate Professor since 2024. Her research interests include unmanned systems, reinforcement learning, distributed optimization, and their applications

Weiran Wu received the B.Eng. degree from the School of Automation, Beijing Institute of Technology, in 2022. He is currently pursuing the Ph.D. degree at the School of Automation, Beijing Institute of Technology. His research interests include reinforcement learning methods and their applications

Yunlong Guo received the B.Eng. degree from the School of Automation, Beijing Institute of Technology, in 2022. He is currently pursuing the Ph.D. degree at the School of Automation, Beijing Institute of Technology. His research interests include multi-robot system, reinforcement learning and trajectory planning

Jian Sun (Senior Member, IEEE) received the B.S. degree from the Department of Automation and Electric Engineering Jilin Institute of Technology, in 2001, the M.Sc. degree from the Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences (CAS) Changchun, in 2004, and the Ph.D.degree from the Institute of Automation, CAS, in 2007. He was a Research Fellow with the Faculty of Advanced Technology University of Glamorgan Pontypridd, U.K., from 2008 to 2009. He was a Post Doctoral Research Fellow with the Beijing Institute of Technology, from 2007 to 2010. In 2010, he joined the School of Automation, Beijing Institute of Technology, where he has been a Professor since 2013. His current research interests include networked control systems, time-delay systems,and security of cyber-physical systems. Dr. Sun is an Editorial Board Member of the IEEE Transactions on Systems Man and Cybernetics: Systems, the Journal of Systems Science & Complexity, and Acta. Automatica Sinica

Qing-Long Han (Fellow, IEEE) received the B.Sc. degree in Mathematics from Shandong Normal University, in 1983, and the M.Sc. and Ph.D. degrees in Control Engineering from East China University of Science and Technology, in 1992 and 1997, respectively. Professor Han is Pro Vice-Chancellor (Research Quality) and a Distinguished Professor at Swinburne University of Technology, Australia. He held various academic and management positions at Griffith University and Central Queensland University, Australia. His research interests include networked control systems, multi-agent systems, time-delay systems, smart grids, unmanned surface vehicles, and neural networks. Professor Han was awarded the 2024 IEEE Dr.-Ing. Eugene Mittelmann Achievement Award (the Highest Award in Industrial Electronics), the 2024 Chinese Association of Automation (CAA) Science and Technology Achievement Award (the Highest Achievement Award of CAA in Automation, Information and Intelligent Science), the 2021 Norbert Wiener Award (the Highest Award in Systems Science and Engineering, and Cybernetics), and the 2021 M. A. Sargent Medal (the Highest Award of the Electrical College Board of Engineers Australia). He was the recipient of the IEEE Systems, Man, and Cybernetics Society Andrew P. Sage Best Transactions Paper Award in 2019, 2020, and 2022, respectively, the IEEE/CAA Journal of Automatica Sinica Norbert Wiener Review Award in 2020, and the IEEE Transactions on Industrial Informatics Outstanding Paper Award in 2020. Professor Han is a Member of the Academia Europaea (The Academy of Europe). He is a Fellow of the International Federation of Automatic Control (FIFAC), an Honorary Fellow of the Institution of Engineers Australia (HonFIEAust), and a Fellow of the Chinese Association of Automation (FCAA). He is a Highly Cited Researcher in both Engineering and Computer Science (Clarivate). He has served as an AdCom Member of IEEE Industrial Electronics Society (IES), a Member of IEEE IES Fellows Committee, a Member of IEEE IES Publications Committee, Chair of IEEE IES Technical Committee on Network-Based Control Systems and Applications, and the Co-Editor-in-Chief of IEEE Transactions on Industrial Informatics. He is currently the President-Elect, an Executive Board Member, and a Steering Committee Member of Asian Control Association (ACA). He is currently the Editor-in-Chief of IEEE/CAA Journal of Automatica Sinica
Corresponding author: Jian Sun, e-mail: sunjian@bit.edu.cn
Received Date: 2025-04-04
Revised Date: 2025-05-26
Accepted Date: 2025-05-28

Available Online: 2025-06-03

Abstract

Abstract

Multi-agent systems (MASs) have demonstrated significant achievements in a wide range of tasks, leveraging their capacity for coordination and adaptation within complex environments. Moreover, the enhancement of their intelligent functionalities is crucial for tackling increasingly challenging tasks. This goal resonates with a paradigm shift within the artificial intelligence (AI) community, from “internet AI” to “embodied AI”, and the MASs with embodied AI are referred to as embodied multi-agent systems (EMASs). An EMAS has the potential to acquire generalized competencies through interactions with environments, enabling it to effectively address a variety of tasks and thereby make a substantial contribution to the quest for artificial general intelligence. Despite the burgeoning interest in this domain, a comprehensive review of EMAS has been lacking. This paper offers analysis and synthesis for EMASs from a control perspective, conceptualizing each embodied agent as an entity equipped with a “brain” for decision and a “body” for environmental interaction. System designs are classified into open-loop, closed-loop, and double-loop categories, and EMAS implementations are discussed. Additionally, the current applications and challenges faced by EMASs are summarized and potential avenues for future research in this field are provided.
- Embodied intelligence,
- multi-agent system,
- feedback control,
- interaction

FullText(HTML)

References(161)

References

[1]	S. Shalev-Shwartz, S. Shammah, and A. Shashua, “Safe, multi-agent, reinforcement learning for autonomous driving,” arXiv preprint arXiv: 1610.03295, 2016.
[2]	M. Vinyals, J. A. Rodriguez-Aguilar, and J. Cerquides, “A survey on sensor networks from a multiagent perspective,” The Computer Journal, vol. 54, no. 3, pp. 455–470, 2011. doi: 10.1093/comjnl/bxq018
[3]	Y. Rizk, M. Awad, and E. W. Tunstel, “Cooperative heterogeneous multi-robot systems: A survey,” ACM Computing Surveys (CSUR), vol. 52, no. 2, pp. 1–31, 2019.
[4]	K. Zhang, Z. Yang, and T. Başar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,” Handbook of Reinforcement Learning and Control, pp. 321–384, 2021.
[5]	R. Fjelland, “Why general artificial intelligence will not be realized,” Humanities and Social Sciences Communications, vol. 7, no. 1, pp. 1–9, 2020. doi: 10.1057/s41599-020-0492-6
[6]	Y. Peng, J. Han, Z. Zhang, L. Fan, T. Liu, S. Qi, X. Feng, Y. Ma, Y. Wang, and S.-C. Zhu, “The tong test: Evaluating artificial general intelligence through dynamic embodied physical and social interactions,” Engineering, vol. 34, pp. 12–22, 2024. doi: 10.1016/j.eng.2023.07.006
[7]	A. Cangelosi, J. Bongard, M. H. Fischer, and S. Nolfi, Embodied Intelligence. Berlin, Heidelberg: Springer Berlin Heidelberg, 2015, pp. 697–714.
[8]	J. Duan, S. Yu, H. L. Tan, H. Zhu, and C. Tan, “A survey of embodied AI: From simulators to research tasks,” IEEE Trans. Emerging Topics in Computational Intelligence, vol. 6, no. 2, pp. 230–244, 2022.
[9]	T. Guo, X. Chen, Y. Wang, R. Chang, S. Pei, N. V. Chawla, O. Wiest, and X. Zhang, “Large language model based multi-agents: A survey of progress and challenges,” arXiv preprint arXiv: 2402.01680, 2024.
[10]	N. Roy, I. Posner, T. Barfoot, P. Beaudoin, Y. Bengio, J. Bohg, O. Brock, I. Depatie, D. Fox, D. Koditschek, T. Lozano-Perez, V. Mansinghka, C. Pal, B. Richards, D. Sadigh, S. Schaal, G. Sukhatme, D. Therien, M. Toussaint, and Michiel. Van de Panne, “From machine learning to robotics: Challenges and opportunities for embodied intelligence,” arXiv preprint arXiv: 2110.15245, 2021.
[11]	W. Zhou, X. Zhu, Q.-L. Han, L. Li, X. Chen, S. Wen, and Y. Xiang, “The security of using large language models: A survey with emphasis on ChatGPT,” IEEE/CAA Journal of Automatica Sinica, vol. 12, no. 1, pp. 1–26, 2025. doi: 10.1109/JAS.2024.124983
[12]	X. Zhu, W. Zhou, Q.-L. Han, W. Ma, S. Wen, and Y. Xiang, “When software security meets large language models: A survey,” IEEE/CAA Journal of Automatica Sinica, vol. 12, no. 2, pp. 317–334, 2025. doi: 10.1109/JAS.2024.124971
[13]	Y. Liu, W. Chen, Y. Bai, J. Luo, X. Song, K. Jiang, Z. Li, G. Zhao, J. Lin, G. Li, W. Gao, L. Lin, “Aligning cyber space with physical world: A comprehensive survey on embodied ARTICLE,” arXiv preprint arXiv: 2407.06886, 2024.
[14]	R. Brooks, “Intelligence without representation,” Artificial Intelligence, vol. 47, no. 1−3, pp. 139–159, 1991. doi: 10.1016/0004-3702(91)90053-M
[15]	L. Smith and M. Gasser, “The development of embodied cognition: Six lessons from babies,” Artificial Life, vol. 11, no. 1−2, pp. 13–29, 2005. doi: 10.1162/1064546053278973
[16]	G. Lakoff, M. Johnson, and J. F. Sowa, “Review of philosophy in the flesh: The embodied mind and its challenge to western thought,” Computational Linguistics, vol. 25, no. 4, pp. 631–634, 1999.
[17]	S. K. Ramakrishnan, D. Jayaraman, and K. Grauman, “An exploration of embodied visual exploration,” Int. Journal of Computer Vision, vol. 129, no. 5, pp. 1616–1649, 2021. doi: 10.1007/s11263-021-01437-z
[18]	G. D. Reynolds and K. C. Roth, “The development of attentional biases for faces in infancy: A developmental systems perspective,” Frontiers in Psychology, vol. 9, p. 222, 2018. doi: 10.3389/fpsyg.2018.00222
[19]	A. Dorri, S. S. Kanhere, and R. Jurdak, “Multi-agent systems: A survey,” IEEE Access, vol. 6, pp. 28573–28593, 2018. doi: 10.1109/ACCESS.2018.2831228
[20]	T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications,” IEEE Trans. Cybernetics, vol. 50, no. 9, pp. 3826–3839, 2020. doi: 10.1109/TCYB.2020.2977374
[21]	G. F. Franklin, J. D. Powell, A. Emami-Naeini, and J. D. Powell, Feedback control of dynamic systems. Prentice Hall Upper Saddle River, 2002, vol. 4.
[22]	N. R. Jennings, K. Sycara, and M. Wooldridge, “A roadmap of agent research and development,” Autonomous Agents and Multi-Agent Systems, vol. 1, pp. 7–38, 1998. doi: 10.1023/A:1010090405266
[23]	C. Zhu, M. Dastani, and S. Wang, “A survey of multi-agent reinforcement learning with communication,” arXiv preprint arXiv: 2203.08975, 2022.
[24]	P. Liu, Y. Zhou, D. Peng, and D. Wu, “Global-attention-based neural networks for vision language intelligence,” IEEE/CAA Journal of Automatica Sinica, vol. 8, no. 7, pp. 1243–1252, 2020.
[25]	b. ichter, A. Brohan, Y. Chebotar, C. Finn, K. Hausman, A. Herzog, D. Ho, J. Ibarz, A. Irpan, E. Jang, R. Julian, D. Kalashnikov, S. Levine, Y. Lu, C. Parada, K. Rao, P. Sermanet, A. T. Toshev, V. Vanhoucke, F. Xia, T. Xiao, P. Xu, M. Yan, N. Brown, M. Ahn, O. Cortes, N. Sievers, C. Tan, S. Xu, D. Reyes, J. Rettinghouse, J. Quiambao, P. Pastor, L. Luu, K.-H. Lee, Y. Kuang, S. Jesmonth, N. J. Joshi, K. Jeffrey, R. J. Ruano, J. Hsu, K. Gopalakrishnan, B. David, A. Zeng, and C. K. Fu, “Do as I can, not as I say: Grounding language in robotic affordances,” in Proc. the 6th Conf. on Robot Learning, PMLR, vol. 205, 2023, pp. 287–318.
[26]	T. Yoneda, J. Fang, P. Li, H. Zhang, T. Jiang, S. Lin, B. Picker, D. Yunis, H. Mei, and M. R. Walter, “Statler: State-maintaining language models for embodied reasoning,” arXiv preprint arXiv: 2306.17840, 2023.
[27]	Y. Ze, G. Yan, Y.-H. Wu, A. Macaluso, Y. Ge, J. Ye, N. Hansen, L. E. Li, and X. Wang, “GNFactor: Multi-task real robot learning with generalizable neural feature fields,” in Conf. on Robot Learning. PMLR, vol. 229, pp. 284–301, 2023.
[28]	W. Shen, G. Yang, A. Yu, J. Wong, L. P. Kaelbling, and P. Isola, “Distilled feature fields enable few-shot language-guided manipulation,” arXiv preprint arXiv: 2308.07931, 2023.
[29]	C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard, “Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age,” IEEE Trans. Robotics, vol. 32, no. 6, pp. 1309–1332, 2016. doi: 10.1109/TRO.2016.2624754
[30]	Z. Qian, J. Fu, and J. Xiao, “Towards accurate loop closure detection in semantic SLAM with 3D semantic covisibility graphs,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2455–2462, 2022. doi: 10.1109/LRA.2022.3145066
[31]	H. Xu, P. Liu, X. Chen, and S. Shen, “D²SLAM: Decentralized and distributed collaborative visual-inertial SLAM system for aerial swarm,” IEEE Trans. Robotics, vol. 40, pp. 3445–3464, 2024. doi: 10.1109/TRO.2024.3422003
[32]	Y. Tian, Y. Chang, F. Herrera Arias, C. Nieto-Granda, J. P. How, and L. Carlone, “Kimera-multi: Robust, distributed, dense metric-semantic SLAM for multi-robot systems,” IEEE Trans. Robotics, vol. 38, no. 4, pp. 2022–2038, 2022. doi: 10.1109/TRO.2021.3137751
[33]	D. Shah, B. Osinski, B. Ichter, and S. Levine, “LM-Nav: Robotic navigation with large pre-trained models of language, vision, and action,” in Conf. on Robot Learning. PMLR, vol. 205, pp. 492−504, 2023.
[34]	J. Zhang, K. Wang, R. Xu, G. Zhou, Y. Hong, X. Fang, Q. Wu, Z. Zhang, and H. Wang, “NaVid: Video-based VLM plans the next step for vision-and-language navigation,” arXiv preprint arXiv: 2402.15852, 2024.
[35]	C. Huang, O. Mees, A. Zeng, and W. Burgard, “Audio visual language maps for robot navigation,” IEEE Int. Conf. on Robotics and Automation, 2023, pp.10608−10615.
[36]	D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of GO with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016. doi: 10.1038/nature16961
[37]	S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in Int. Conf. on Machine Learning. PMLR, 2018, pp. 1587–1596.
[38]	E. Kaufmann, L. Bauersfeld, A. Loquercio, M. Müller, V. Koltun, and D. Scaramuzza, “Champion-level drone racing using deep reinforcement learning,” Nature, vol. 620, no. 7976, pp. 982–987, 2023. doi: 10.1038/s41586-023-06419-4
[39]	W. Huang, P. Abbeel, D. Pathak, and I. Mordatch, “Language models as zero-shot planners: Extracting actionable knowledge for embodied agents,” in Int. Conf. on Machine Learning. PMLR, 2022, pp. 9118–9147.
[40]	J. Zhang, C. Bai, H. He, Z. Wang, B. Zhao, X. Li, and X. Li, “SAM-E: Leveraging visual foundation model with sequence imitation for embodied manipulation,” in Proc. of Machine Learning Research, vol. 235, pp. 58579−58598, 2024.
[41]	X. Li, M. Liu, H. Zhang, C. Yu, J. Xu, H. Wu, C. Cheang, Y. Jing, W. Zhang, H. Liu, H. Li, and T. Kong, “Vision-language foundation models as effective robot imitators,” arXiv preprint arXiv: 2311.01378, 2024.
[42]	Y. Tang, W. Yu, J. Tan, H. Zen, A. Faust, and T. Harada, “SayTap: Language to quadrupedal locomotion,” arXiv preprint arXiv: 2306.07580, 2023.
[43]	Z. Zhu, X. Wang, W. Zhao, C. Min, N. Deng, M. Dou, Y. Wang, B. Shi, K. Wang, C. Zhang, Y. You, Z. Zhang, D. Zhao, L. Xiao, J. Zhao, J. Lu, and G. Huang, “Is sora a world simulator? A comprehensive survey on general world models and beyond,” arXiv preprint arXiv: 2405.03520, 2024.
[44]	Z. Ding, A. Zhang, Y. Tian, and Q. Zheng, “Diffusion world model: Future modeling beyond step-by-step rollout for offline reinforcement learning,” arXiv preprint arXiv: 2402.03570, 2024.
[45]	V. Micheli, E. Alonso, and F. Fleuret, “Transformers are sample-efficient world models,” in Int. Conf. on Learning Representations, pp. 1−21, 2023.
[46]	M. Assran, Q. Duval, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat, Y. LeCun, and N. Ballas, “Self-supervised learning from images with a joint-embedding predictive architecture,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2023, pp. 15619–15629.
[47]	A. Herzog, K. Rao, K. Hausman, Y. Lu, P. Wohlhart, M. Yan, J. Lin, M. G. Arenas, T. Xiao, D. Kappler, D. Ho, J. Rettinghouse, Y. Chebotar, K.-H. Lee, K. Gopalakrishnan, R. Julian, A. Li, C. Fu, B. Wei, S. Ramesh, K. Holden, K. Kleiven, D. Rendleman, S. Kirmani, J. Bingham, J. Weisz, Y. Xu, W. Lu, M. Bennice, C. Fong, D. Do, J. Lam, Y. Bai, B. Holson, M. Quinlan, N. Brown, M. Kalakrishnan, J. Ibarz, P. Pastor, and S. Levine, “Deep RL at scale: Sorting waste in office buildings with a fleet of mobile manipulators,” arXiv preprint arXiv: 2305.03270, 2023.
[48]	I. Dasgupta, C. Kaeser-Chen, K. Marino, A. Ahuja, S. Babayan, F. Hill, and R. Fergus, “Collaborating with language models for embodied reasoning,” arXiv preprint arXiv: 2302.00763, 2023.
[49]	Y. Hu, F. Lin, T. Zhang, L. Yi, and Y. Gao, “Look before you leap: Unveiling the power of GPT-4V in robotic vision-language planning,” arXiv preprint arXiv: 2311.17842, 2023.
[50]	K. Li, J. Wang, L. Yang, C. Lu, and B. Dai, “Semgrasp: Semantic grasp generation via language aligned discretization,” arXiv preprint arXiv: 2404.03590, 2024.
[51]	M. Shridhar, L. Manuelli, and D. Fox, “CLIPort: What and where pathways for robotic manipulation,” in Conf. on Robot Learning. PMLR, pp. 894−906, 2021.
[52]	A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, “Learning transferable visual models from natural language supervision,” in Conference on Machine Learning. PMLR, 2021, pp. 8748–8763.
[53]	J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,” Science Robotics, vol. 4, no. 26, p. eaau5872, 2019. doi: 10.1126/scirobotics.aau5872
[54]	J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,” Science Robotics, vol. 5, no. 47, p. eabc5986, 2020. doi: 10.1126/scirobotics.abc5986
[55]	Z. Fu, X. Cheng, and D. Pathak, “Deep whole-body control: Learning a unified policy for manipulation and locomotion,” in Conf. on Robot Learning. PMLR, no.205, pp.138−149, 2022.
[56]	G. Bhardwaj, S. Dasgupta, N. Sukavanam, and R. Balasubramanian, “Soft soil gait planning and control for biped robot using deep deterministic policy gradient approach,” arXiv preprint arXiv: 2306.08063, 2023.
[57]	M. E. Moran, “Evolution of robotic arms,” Journal of robotic surgery, vol. 1, no. 2, pp. 103–111, 2007. doi: 10.1007/s11701-006-0002-x
[58]	A. Billard and D. Kragic, “Trends and challenges in robot manipulation,” Science, vol. 364, no. 6446, p. eaat8414, 2019. doi: 10.1126/science.aat8414
[59]	T. Luettel, M. Himmelsbach, and H.-J. Wuensche, “Autonomous ground vehicles—Concepts and a path to the future,” Proc. the IEEE, vol. 100, no. Special Centennial Issue, pp. 1831–1839, 2012.
[60]	B. Siciliano, O. Khatib, and T. Kröger, Springer Handbook of Robotics. Springer, 2008, vol. 200.
[61]	A. Ugenti, R. Galati, G. Mantriota, and G. Reina, “Analysis of an all-terrain tracked robot with innovative suspension system,” Mechanism and Machine Theory, vol. 182, p. 105237, 2023. doi: 10.1016/j.mechmachtheory.2023.105237
[62]	S. A. H. Mohsan, M. A. Khan, F. Noor, I. Ullah, and M. H. Alsharif, “Towards the unmanned aerial vehicles (UAVs): A comprehensive review,” Drones, vol. 6, no. 6, p. 147, 2022. doi: 10.3390/drones6060147
[63]	S. Shakoor, Z. Kaleem, M. I. Baig, O. Chughtai, and N. Long, “Role of UAVs in public safety communications: Energy efficiency perspective,” IEEE Access, vol. 7, no. 2019, p. 2019, 2019.
[64]	T. Fukuda, P. Dario, and G.-Z. Yang, “Humanoid robotics—History, current state of the art, and challenges,” Science Robotics, vol. 2, no. 13, p. eaar4043, 2017. doi: 10.1126/scirobotics.aar4043
[65]	Y. Tong, H. Liu, and Z. Zhang, “Advancements in humanoid robots: A comprehensive review and future prospects,” IEEE/CAA Journal of Automatica Sinica, vol. 11, no. 2, pp. 301–328, 2024. doi: 10.1109/JAS.2023.124140
[66]	A. K. Vaskov, P. P. Vu, N. North, A. J. Davis, T. A. Kung, D. H. Gates, P. S. Cederna, and C. A. Chestek, “Surgically implanted electrodes enable real-time finger and grasp pattern recognition for prosthetic hands,” IEEE Trans. Robotics, vol. 38, no. 5, pp. 2841–2857, 2022. doi: 10.1109/TRO.2022.3170720
[67]	Y. Fan, Z. Pei, C. Wang, M. Li, Z. Tang, and Q. Liu, “A review of quadruped robots: Structure, control, and autonomous motion,” Advanced Intelligent Systems, p. 2300783, 2024.
[68]	M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V. Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch, R. Diethelm, S. Bachmann, A. Melzer, and M. Hoepflingers, “Anymal-a highly mobile and dynamic quadrupedal robot,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 38–44, 2016.
[69]	G. Bellegarda, Y. Chen, Z. Liu, and Q. Nguyen, “Robust high-speed running for quadruped robots via deep reinforcement learning,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems. pp. 10364–10370, 2022.
[70]	J. E. Bares and D. S. Wettergreen, “Dante II: Technical description, results, and lessons learned,” The Int. Journal of Robotics Research, vol. 18, no. 7, pp. 621–649, 1999. doi: 10.1177/02783649922066475
[71]	X. Zhou, X. Wen, Z. Wang, Y. Gao, H. Li, Q. Wang, T. Yang, H. Lu, Y. Cao, C. Xu, and F. Gao, “Swarm of micro flying robots in the wild,” Science Robotics, vol. 7, no. 66, p. eabm5954, 2022. doi: 10.1126/scirobotics.abm5954
[72]	L. P. Ramalepa and R. S. Jamisola Jr, “A review on cooperative robotic arms with mobile or drones bases,” Int. Journal of Automation and Computing, vol. 18, no. 4, pp. 536–555, 2021. doi: 10.1007/s11633-021-1299-7
[73]	M. Orsag, C. Korpela, S. Bogdan, and P. Oh, “Dexterous aerial robots—mobile manipulation using unmanned aerial systems,” IEEE Trans. Robotics, vol. 33, no. 6, pp. 1453–1466, 2017. doi: 10.1109/TRO.2017.2750693
[74]	S. Dasari, F. Ebert, S. Tian, S. Nair, B. Bucher, K. Schmeckpeper, S. Singh, S. Levine, and C. Finn, “Robonet: Large-scale multi-robot learning,” arXiv preprint arXiv: 1910.11215, 2019.
[75]	O. X.-E. Collaboration, A. O'Neill, A. Rehman, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, A. Tung, A. Bewley, A. Herzog, A. Irpan, A. Khazatsky, A. Rai, A. Gupta, A. Wang, A. Kolobov, A. Singh, A. Garg, A. Kembhavi, A. Xie, A. Brohan, A. Raffin, A. Sharma, A. Yavary, A. Jain, A. Balakrishna, A. Wahid, B. Burgess-Limerick, B. Kim, B. Schölkopf, B. Wulfe, B. Ichter, C. Lu, C. Xu, C. Le, C. Finn, C. Wang, C. Xu, C. Chi, C. Huang, C. Chan, C. Agia, C. Pan, C. Fu, C. Devin, D. Xu, D. Morton, D. Driess, D. Chen, D. Pathak, D. Shah, D. Büchler, D. Jayaraman, D. Kalashnikov, D. Sadigh, E. Johns, E. Foster, F. Liu, F. Ceola, F. Xia, F. Zhao, F. V. Frujeri, F. Stulp, G. Zhou, G. S. Sukhatme, G. Salhotra, G. Yan, G. Feng, G. Schiavi, G. Berseth, G. Kahn, G. Yang, G. Wang, H. Su, H.-S. Fang, H. Shi, H. Bao, H. B. Amor, H. I. Christensen, H. Furuta, H. Walke, H. Fang, H. Ha, I. Mordatch, I. Radosavovic, I. Leal, J. Liang, J. Abou-Chakra, J. Kim, J. Drake, J. Peters, J. Schneider, J. Hsu, J. Bohg, J. Bingham, J. Wu, J. Gao, J. Hu, J. Wu, J. Wu, J. Sun, J. Luo, J. Gu, J. Tan, J. Oh, J. Wu, J. Lu, J. Yang, J. Malik, J. Silvério, J. Hejna, J. Booher, J. Tompson, J. Yang, J. Salvador, J. J. Lim, J. Han, K. Wang, K. Rao, K. Pertsch, K. Hausman, K. Go, K. Gopalakrishnan, K. Goldberg, K. Byrne, K. Oslund, K. Kawaharazuka, K. Black, K. Lin, K. Zhang, K. Ehsani, K. Lekkala, K. Ellis, K. Rana, K. Srinivasan, K. Fang, K. P. Singh, K.-H. Zeng, K. Hatch, K. Hsu, L. Itti, L. Y. Chen, L. Pinto, L. Fei-Fei, L. Tan, L. J. Fan, L. Ott, L. Lee, L. Weihs, M. Chen, M. Lepert, M. Memmel, M. Tomizuka, M. Itkina, M. G. Castro, M. Spero, M. Du, M. Ahn, M. C. Yip, M. Zhang, M. Ding, M. Heo, M. K. Srirama, M. Sharma, M. J. Kim, N. Kanazawa, N. Hansen, N. Heess, N. J. Joshi, N. Suenderhauf, N. Liu, N. D. Palo, N. M. M. Shafiullah, O. Mees, O. Kroemer, O. Bastani, P. R. Sanketi, P. T. Miller, P. Yin, P. Wohlhart, P. Xu, P. D. Fagan, P. Mitrano, P. Sermanet, P. Abbeel, P. Sundaresan, Q. Chen, Q. Vuong, R. Rafailov, R. Tian, R. Doshi, R. Mart'in-Mart'in, R. Baijal, R. Scalise, R. Hendrix, R. Lin, R. Qian, R. Zhang, R. Mendonca, R. Shah, R. Hoque, R. Julian, S. Bustamante, S. Kirmani, S. Levine, S. Lin, S. Moore, S. Bahl, S. Dass, S. Sonawani, S. Song, S. Xu, S. Haldar, S. Karamcheti, S. Adebola, S. Guist, S. Nasiriany, S. Schaal, S. Welker, S. Tian, S. Ramamoorthy, S. Dasari, S. Belkhale, S. Park, S. Nair, S. Mirchandani, T. Osa, T. Gupta, T. Harada, T. Matsushima, T. Xiao, T. Kollar, T. Yu, T. Ding, T. Davchev, T. Z. Zhao, T. Armstrong, T. Darrell, T. Chung, V. Jain, V. Vanhoucke, W. Zhan, W. Zhou, W. Burgard, X. Chen, X. Chen, X. Wang, X. Zhu, X. Geng, X. Liu, X. Liangwei, X. Li, Y. Pang, Y. Lu, Y. J. Ma, Y. Kim, Y. Chebotar, Y. Zhou, Y. Zhu, Y. Wu, Y. Xu, Y. Wang, Y. Bisk, Y. Dou, Y. Cho, Y. Lee, Y. Cui, Y. Cao, Y.-H. Wu, Y. Tang, Y. Zhu, Y. Zhang, Y. Jiang, Y. Li, Y. Li, Y. Iwasawa, Y. Matsuo, Z. Ma, Z. Xu, Z. J. Cui, Z. Zhang, Z. Fu, and Z. Lin, “Open X-Embodiment: Robotic learning datasets and RT-X models,” in Int. Conf. on Robotics and Automation. IEEE, 2024, pp. 6892−6903.
[76]	A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y. Chen, K. Ellis, P. D. Fagan, J. Hejna, M. Itkina, M. Lepert, Y. J. Ma, P. T. Miller, J. Wu, S. Belkhale, S. Dass, H. Ha, A. Jain, A. Lee, Y. Lee, M. Memmel, S. Park, I. Radosavovic, K. Wang, A. Zhan, K. Black, C. Chi, K. B. Hatch, S. Lin, J. Lu, J. Mercat, A. Rehman, P. R. Sanketi, A. Sharma, C. Simpson, Q. Vuong, H. R. Walke, B. Wulfe, T. Xiao, J. H. Yang, A. Yavary, T. Z. Zhao, C. Agia, R. Baijal, M. G. Castro, D. Chen, Q. Chen, T. Chung, J. Drake, E. P. Foster, J. Gao, V. Guizilini, D. A. Herrera, M. Heo, K. Hsu, J. Hu, M. Z. Irshad, D. Jackson, C. Le, Y. Li, K. Lin, R. Lin, Z. Ma, A. Maddukuri, S. Mirchandani, D. Morton, T. Nguyen, A. O’Neill, R. Scalise, D. Seale, V. Son, S. Tian, E. Tran, A. E. Wang, Y. Wu, A. Xie, J. Yang, P. Yin, Y. Zhang, O. Bastani, G. Berseth, J. Bohg, K. Goldberg, A. Gupta, A. Gupta, D. Jayaraman, J. J. Lim, J. Malik, R. Martín-Martín, S. Ramamoorthy, D. Sadigh, S. Song, J. Wu, M. C. Yip, Y. Zhu, T. Kollar, S. Levine, and C. Finn, “Droid: A large-scale in-the-wild robot manipulation dataset,” arXiv preprint arXiv: 2403.12945, 2024.
[77]	C. Chi, Z. Xu, C. Pan, E. Cousineau, B. Burchfiel, S. Feng, R. Tedrake, and S. Song, “Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots,” arXiv preprint arXiv: 2402.10329, 2024.
[78]	Z. Fu, T. Z. Zhao, and C. Finn, “Mobile ALOHA: Learning bimanual mobile manipulation with low-cost whole-body teleoperation,” arXiv preprint arXiv: 2401.02117, 2024.
[79]	S. Luo, Q. Peng, J. Lv, K. Hong, K. R. Driggs-Campbell, C. Lu, and Y.-L. Li, “Human-agent joint learning for efficient robot manipulation skill acquisition,” arXiv preprint arXiv: 2407.00299, 2024.
[80]	E. Kolve, R. Mottaghi, W. Han, E. VanderBilt, L. Weihs, A. Herrasti, M. Deitke, K. Ehsani, D. Gordon, Y. Zhu, A. Kembhavi, A. Gupta, and A. Farhadi, “AI2-Thor: An interactive 3D environment for visual AI,” arXiv preprint arXiv: 1712.05474, 2017.
[81]	X. Puig, E. Undersander, A. Szot, M. D. Côte, R. Partsey, J. Yang, R. Desai, A. W. Clegg, M. Hlavač, T. Min, T. Gervet, V. Vondruš, V.-P. Berges, J. Turner, O. Maksymets, Z. Kira, M. Kalakrishnan, J. Malik, D. S. Chaplot, U. Jain, D. Batra, A. Rai, and R. Mottaghi, “Habitat 3.0: A co-habitat for humans, avatars and robots,” arXiv preprint arXiv: 2310.13724, 2023.
[82]	F. Xia, A. R. Zamir, Z.-Y. He, A. Sax, J. Malik, and S. Savarese, “Gibson Env: Real-world perception for embodied agents,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition. IEEE, pp. 9068−9079, 2018.
[83]	A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proc. Annual Conf. on Robot Learning, 2017, pp. 1–16.
[84]	E. Todorov, T. Erez, and Y. Tassa, “MuJoCo: A physics engine for model-based control,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033.
[85]	Y. Wang, Z. Xian, F. Chen, T.-H. Wang, Y. Wang, K. Fragkiadaki, Z. Erickson, D. Held, and C. Gan, “Robogen: Towards unleashing infinite data for automated robot learning via generative simulation,” arXiv preprint arXiv: 2311.01455, 2024.
[86]	L. Wang, Y. Ling, Z. Yuan, M. Shridhar, C. Bao, Y. Qin, B. Wang, H. Xu, and X. Wang, “GenSim: Generating robotic simulation tasks via large language models,” arXiv preprint arXiv: 2310.01361, 2024.
[87]	Y. Du, M. Yang, B. Dai, H. Dai, O. Nachum, J. B. Tenenbaum, D. Schuurmans, and P. Abbeel, “Learning universal policies via text-guided video generation,” in Conf. on Neural Information Processing Systems, pp.9156−9172, 2023.
[88]	Z. Wan, Y. Xie, C. Zhang, Z. Lin, Z. Wang, S. Stepputtis, D. Ramanan, and K. P. Sycara, “Instructpart: Affordance-based part segmentation from language instruction,” in AAAI-2024 Workshop on Public Sector LLMs: Algorithmic and Sociotechnical Design, 2024.
[89]	T. Gupta, W. Gong, C. Ma, N. Pawlowski, A. Hilmkil, M. Scetbon, A. Famoti, A. J. Llorens, J. Gao, S. Bauer, D. Kragic, B. Schölkopf, and C. Zhang, “The essential role of causality in foundation world models for embodied AI,” arXiv preprint arXiv: 2402.06665, 2024.
[90]	M. Kaspar, J. D. M. Osorio, and J. Bock, “Sim2real transfer for reinforcement learning without dynamics randomization,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2020, pp. 4383–4388.
[91]	B. Planche, Z. Wu, K. Ma, S. Sun, S. Kluckner, T. Chen, A. Hutter, S. Zakharov, H. Kosch, and J. Ernst, “DepthSynth: Real-time realistic synthetic data generation from CAD models for 2.5D recognition,” in Int. Conf. on 3D Vision, pp. 1−10, 2017.
[92]	J. Truong, S. Chernova, and D. Batra, “Bi-directional domain adaptation for sim2real transfer of embodied navigation agents,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2634–2641, 2021. doi: 10.1109/LRA.2021.3062303
[93]	J. Shi, Y. Jin, D. Li, H. Niu, Z. Jin, H. Wang, “ASGrasp: Generalizable transparent object reconstruction and grasping from RGB-D active stereo camera,” arXiv preprint arXiv: 2405.05648, 2024.
[94]	Y. Mu, Q. Zhang, M. Hu, W. Wang, M. Ding, J. Jin, B. Wang, J. Dai, Y. Qiao, and P. Luo, “EmbodiedGPT: Vision-language pre-training via embodied chain of thought,” Advances in Neural Information Processing Systems, vol. 36, 2024.
[95]	D. Kalashnikov, J. Varley, Y. Chebotar, B. Swanson, R. Jonschkowski, C. Finn, S. Levine, and K. Hausman, “Mt-opt: Continuous multi-task robotic reinforcement learning at scale,” arXiv preprint arXiv: 2104.08212, 2021.
[96]	J. Long, Z. Wang, Q. Li, J. Gao, L. Cao, and J. Pang, “Hybrid internal model: A simple and efficient learner for agile legged locomotion,” arXiv preprint arXiv: 2312.11460, 2023.
[97]	G. Cao, Y. Zhou, D. Bollegala, and S. Luo, “Spatio-temporal attention model for tactile texture recognition,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2020, pp. 9896–9902.
[98]	C. Wu, J. Chen, Q. Cao, J. Zhang, Y. Tai, L. Sun, and K. Jia, “Grasp proposal networks: An end-to-end solution for visual learning of robotic grasps,” Advances in Neural Information Processing Systems, vol. 33, pp. 13174–13184, 2020.
[99]	S. Ainetter and F. Fraundorfer, “End-to-end trainable deep neural network for robotic grasp detection and semantic segmentation from rgb,” in Int. Conf. on Robotics and Automation. IEEE, 2021, pp. 13452–13458.
[100]	M. Shridhar, L. Manuelli, and D. Fox, “Cliport: What and where pathways for robotic manipulation,” in Conf. on Robot Learning. PMLR, 2022, pp. 894–906.
[101]	Z. Qi, R. Dong, S. Zhang, H. Geng, C. Han, Z. Ge, H. Wang, L. Yi, and K. Ma, “Shapellm: Universal 3D object understanding for embodied interaction,” arXiv preprint arXiv: 2402.17766, 2024.
[102]	L. Annamalai and C. S. Thakur, “EventASEG: An event-based asynchronous segmentation of road with likelihood attention,” IEEE Robotics and Automation Letters, vol.9, no.8, pp. 6951−6928, 2024.
[103]	T. Carta, C. Romac, T. Wolf, S. Lamprier, O. Sigaud, and P.-Y. Oudeyer, “Grounding large language models in interactive environments with online reinforcement learning,” in Int. Conf. on Machine Learning. PMLR, 2023, pp. 3676–3713.
[104]	H. Zhang, W. Du, J. Shan, Q. Zhou, Y. Du, J. B. Tenenbaum, T. Shu, and C. Gan, “Building cooperative embodied agents modularly with large language models,” in Int. Conf. on Learning Representations, 2024.
[105]	W. Huang, F. Xia, T. Xiao, H. Chan, J. Liang, P. Florence, A. Zeng, J. Tompson, I. Mordatch, Y. Chebotar, P. Sermanet, N. Brown, T. Jackson, L. Luu, S. Levine, K. Hausman, and B. Ichter, “Inner monologue: Embodied reasoning through planning with language models,” in Annual Conf. on Robot Learning, PMLR, no. 205, pp. 1769−1782, 2022.
[106]	Y. Long, X. Li, W. Cai, and H. Dong, “Discuss before moving: Visual language navigation via multi-expert discussions,” in IEEE Int. Conf. on Robotics and Automation, pp. 4842−4849, 2024.
[107]	Y. Ding, H. Geng, C. Xu, X. Fang, J. Zhang, S. Wei, Q. Dai, Z. Zhang, and H. Wang, “Open6DOR: Benchmarking open-instruction 6-DoF object rearrangement and a VLM-based approach,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp.7359−7366, 2024.
[108]	W. Huang, C. Wang, R. Zhang, Y. Li, J. Wu, and F.-F. Li, “VoxPoser: Composable 3D value maps for robotic manipulation with language models,” arXiv preprint arXiv: 2307.05973, 2023.
[109]	H. Zhang, J. Xu, Y. Mo, and T. Kong, “InViG: Open-ended interactive visual grounding in human-robot interaction with 500k dialogues,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops, pp. 5508−5518, 2024.
[110]	Z. Chen, J. Wu, W. Wang, W. Su, G. Chen, S. Xing, M. Zhong, Q. Zhang, X. Zhu, L. Lu, B. Li, P. Luo, T. Lu, Y. Qiao, and J. Dai, “InternVL: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,” arXiv preprint arXiv: 2312.14238, 2023.
[111]	R. Dong, C. Han, Y. Peng, Z. Qi, Z. Ge, J. Yang, L. Zhao, J. Sun, H. Zhou, H. Wei, X. Kong, X. Zhang, K. Ma, and L. Yi, “DreamLLM: Synergistic multimodal comprehension and creation,” in Int. Conf. on Learning Representations, 2024.
[112]	M. Ahn, D. Dwibedi, C. Finn, M. G. Arenas, K. Gopalakrishnan, K. Hausman, B. Ichter, A. Irpan, N. Joshi, R. Julian, S. Kirmani, I. Leal, E. Lee, S. Levine, Y. Lu, I. Leal, S. Maddineni, K. Rao, D. Sadigh, P. Sanketi, P. Sermanet, Q. Vuong, S. Welker, F. Xia, T. Xiao, P. Xu, S. Xu, and Z. Xu, “AutoRT: Embodied foundation models for large scale orchestration of robotic agents,” arXiv preprint arXiv: 2401.12963, 2024.
[113]	S. Gao, Y. Dai, and A. Nathan, “Tactile and vision perception for intelligent humanoids,” Advanced Intelligent Systems, vol. 4, no. 2, p. 2100074, 2022. doi: 10.1002/aisy.202100074
[114]	H. Liu, D. Guo, F. Sun, W. Yang, S. Furber, and T. Sun, “Embodied tactile perception and learning,” Brain Science Advances, vol. 6, no. 2, pp. 132–158, 2020. doi: 10.26599/BSA.2020.9050012
[115]	H. Li and L. Cheng, “A systematic review on hand exoskeletons from the mechatronics aspect,” IEEE/ASME Trans. Mechatronics, pp. 1−19, 2024.
[116]	W. Yuan, Y. Mo, S. Wang, and E. H. Adelson, “Active clothing material perception using tactile sensing and deep learning,” in IEEE Int. Conf. on Robotics and Automation, 2018, pp. 4842–4849.
[117]	Z. Li, L. Cheng, Z. Liu, J. Wei, and Y. Wang, “Focers: An ultrasensitive and robust soft optical 3D tactile sensor,” Soft Robotics, 2025.
[118]	J. Yang, X. Chen, S. Qian, N. Madaan, M. Iyengar, D. F. Fouhey, and J. Chai, “LLM-Grounder: Open-vocabulary 3D visual grounding with large language model as an agent,” in Int. Conf. on Robotics and Automation, 2024, pp. 7694–7701.
[119]	Z. Li, X. Wu, H. Du, H. Nghiem and G. Shi, “Benchmark evaluations, applications, and challenges of large vision language models: A survey,” arXiv preprint arXiv:2501.02189, 2025.
[120]	G. Shi, W. Hönig, X. Shi, Y. Yue, and S.-J. Chung, “Neural-swarm2: Planning and control of heterogeneous multirotor swarms using learned interactions,” IEEE Trans. Robotics, vol. 38, no. 2, pp. 1063–1079, 2021.
[121]	A. Gupta, S. Savarese, S. Ganguli, and L. Fei-Fei, “Embodied intelligence via learning and evolution,” Nature communications, vol. 12, no. 1, p. 5721, 2021. doi: 10.1038/s41467-021-25874-z
[122]	Y. Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wang, L. Lu, X. Jia, Q. Liu, J. Dai, Y. Qiao, and H. Li, “Planning-oriented autonomous driving,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp.17853−17862, 2023.
[123]	K. L. Downing, “Neuroscientific implications for situated and embodied artificial intelligence,” Connection Science, vol. 19, no. 1, pp. 75–104, 2007. doi: 10.1080/09540090701192584
[124]	J. Chen, J. Li, Y. Huang, C. Garrett, D. Sun, C. Fan, A. Hofmann, C. Mueller, S. Koenig, and B. C. Williams, “Cooperative task and motion planning for multi-arm assembly systems,” arXiv preprint arXiv: 2203.02475, 2022.
[125]	X. Liu, X. Li, D. Guo, S. Tan, H. Liu, and F. Sun, “Embodied multi-agent task planning from ambiguous instruction.” in Robotics: Science and Systems, pp. 1−14, 2022.
[126]	X. Liu, D. Guo, H. Liu, and F. Sun, “Multi-agent embodied visual semantic navigation with scene prior knowledge,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3154–3161, 2022. doi: 10.1109/LRA.2022.3145964
[127]	S. Sukhbaatar, A. Szlam, and R. Fergus, “Learning multiagent communication with backpropagation,” Advances in Neural Information Processing Systems, vol. 29, 2016.
[128]	Y. Hoshen, “Vain: Attentional multi-agent predictive modeling,” Advances in Neural Information Processing Systems, vol. 30, pp. 6283−6292, 2017.
[129]	J. Jiang and Z. Lu, “Learning attentional communication for multi-agent cooperation,” Advances in Neural Information Processing Systems, vol. 31, pp. 646−654, 2018.
[130]	A. Das, T. Gervet, J. Romoff, D. Batra, D. Parikh, M. Rabbat, and J. Pineau, “TarMAC: Targeted multi-agent communication,” in Int. Conf. on Machine Learning, PMLR, vol. 97, 2019, pp. 1538–1546.
[131]	Y.-C. Liu, J. Tian, N. Glaser, and Z. Kira, “When2com: Multi-agent perception via communication graph grouping,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2020.
[132]	Y.-C. Liu, J. Tian, C.-Y. Ma, N. Glaser, C.-W. Kuo, and Z. Kira, “Who2com: Collaborative perception via learnable handshake communication,” in Int. Conf. on Robotics and Automation. IEEE, 2020, pp. 6876–6883.
[133]	Y. Hu, S. Fang, Z. Lei, Y. Zhong, and S. Chen, “Where2comm: Communication-efficient collaborative perception via spatial confidence maps,” Advances in Neural Information Processing Systems, vol. 35, pp. 4874–4886, 2022.
[134]	K. Yang, D. Yang, J. Zhang, H. Wang, P. Sun, and L. Song, “What2comm: Towards communication-efficient collaborative perception via feature decoupling,” in ACM Int. Conf. on Multimedia, 2023, pp. 7686–7695.
[135]	D. Yang, K. Yang, Y. Wang, J. Liu, Z. Xu, R. Yin, P. Zhai, and L. Zhang, “How2comm: Communication-efficient and collaboration-pragmatic multi-agent perception,” Advances in Neural Information Processing Systems, vol. 36, 2024.
[136]	X. Liu, D. Guo, X. Zhang, and H. Liu, “Heterogeneous embodied multi-agent collaboration,” IEEE Robotics and Automation Letters, vol. 9, no. 2, pp. 5377−5384, 2024.
[137]	V. Sharma, P. Goyal, K. Lin, G. Thattai, Q. Gao, and G. S. Sukhatme, “Ch-MARL: A multimodal benchmark for cooperative, heterogeneous multi-agent reinforcement learning,” arXiv preprint arXiv: 2208.13626, 2022.
[138]	R. Gong, X. Gao, Q. Gao, S. Shakiah, G. Thattai, and G. S. Sukhatme, “Lemma: Learning language-conditioned multi-robot manipulation,” IEEE Robotics and Automation Letters, vol. 8, no. 10, pp. 6835−6842, 2023.
[139]	S. Patel, S. Wani, U. Jain, A. G. Schwing, S. Lazebnik, M. Savva, and A. X. Chang, “Interpretation of emergent communication in heterogeneous collaborative embodied agents,” in Proc. the IEEE/CVF Int. Conf. on Computer Vision, 2021, pp. 15953–15963.
[140]	Y. Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch, “Improving factuality and reasoning in language models through multiagent debate,” arXiv preprint arXiv: 2305.14325, 2023.
[141]	R. Galin and R. Meshcheryakov, “Review on human-robot interaction during collaboration in a shared workspace,” in Interactive Collaborative Robotics: 4th Int. Conf., Istanbul, Turkey. Springer, 2019, pp. 63–74.
[142]	T. B. Sheridan, “Human–robot interaction: Status and challenges,” Human factors, vol. 58, no. 4, pp. 525–532, 2016. doi: 10.1177/0018720816644364
[143]	A. Das, S. Datta, G. Gkioxari, S. Lee, D. Parikh, and D. Batra, “Embodied question answering,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition Workshops, pp. 2054−2063, 2017.
[144]	S. Tan, M. Ge, D. Guo, H. Liu, and F. Sun, “Knowledge-based embodied question answering,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4749−4756, 2021.
[145]	L. Yu, X. Chen, G. Gkioxari, M. Bansal, T. L. Berg, and D. Batra, “Multi-target embodied question answering,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 6309−6318, 2019.
[146]	U. Jain, L. Weihs, E. Kolve, A. Farhadi, S. Lazebnik, A. Kembhavi, and A. Schwing, “A cordial sync: Going beyond marginal policies for multi-agent embodied tasks,” in European Conf. on Computer Vision, 2020, pp. 471–490.
[147]	U. Jain, L. Weihs, E. Kolve, M. Rastegari, S. Lazebnik, A. Farhadi, A. G. Schwing, and A. Kembhavi, “Two body problem: Collaborative visual task completion,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2019, pp. 6689–6699.
[148]	A. Szot, U. Jain, D. Batra, Z. Kira, R. Desai, and A. Rai, “Adaptive coordination in social embodied rearrangement,” in Int. Conf. on Machine Learning. PMLR, 2023, pp. 33365–33380.
[149]	Y. Liu, X. Cao, T. Chen, Y. Jiang, J. You, M. Wu, X. Wang, M. Feng, Y. Jin, and J. Chen, “From screens to scenes: A survey of embodied AI in healthcare,” arXiv preprint arXiv: 2501.07468, 2025.
[150]	S. E. Salcudean, H. Moradi, D. G. Black, and N. Navab, “Robot-assisted medical imaging: A review,” Proc. the IEEE, vol. 110, no. 7, pp. 951–967, 2022. doi: 10.1109/JPROC.2022.3162840
[151]	T. J. Loftus, M. S. Altieri, J. A. Balch, K. L. Abbott, J. Choi, J. S. Marwaha, D. A. Hashimoto, G. A. Brat, Y. Raftopoulos, H. L. Evans, G. P. Jackson, D. S. Walsh, C. J. Tignanelli, “Artificial intelligence–enabled decision support in surgery: State-of-the-art and future directions,” Annals of Surgery, vol. 278, no. 1, pp. 51–58, 2023. doi: 10.1097/SLA.0000000000005853
[152]	S. Jana, G. F. Italiano, M. J. Kashyop, A. L. Konstantinidis, E. Kosinas, and P. S. Mandal, “Online drone scheduling for last-mile delivery,” in Int. Colloquium on Structural Information and Communication Complexity. Springer, 2024, pp. 488–493.
[153]	N. Cherif, W. Jaafar, H. Yanikomeroglu, and A. Yongacoglu, “RL-based cargo-UAV trajectory planning and cell association for minimum handoffs, disconnectivity, and energy consumption,” IEEE Trans. on Vehicular Technology, vol. 73, no. 5, pp. 7304−7309, 2023.
[154]	R. Bokade, X. Jin, and C. Amato, “Multi-agent reinforcement learning based on representational communication for large-scale traffic signal control,” IEEE Access, vol. 11, pp. 47646–47658, 2023. doi: 10.1109/ACCESS.2023.3275883
[155]	H. Zhang, S.-H. Chan, J. Zhong, J. Li, P. Kolapo, S. Koenig, Z. Agioutantis, S. Schafrik, and S. Nikolaidis, “Multi-robot geometric task-and-motion planning for collaborative manipulation tasks,” Autonomous Robots, vol. 47, no. 8, pp. 1537–1558, 2023. doi: 10.1007/s10514-023-10148-y
[156]	A. Fenoy, J. Zagoli, F. Bistaffa, and A. Farinelli, “Attention for the allocation of tasks in multi-agent pickup and delivery,” in Proc. the 39th ACM/SIGAPP Symposium on Applied Computing, 2024, pp. 622–629.
[157]	J. Xu, Q. Sun, Q.-L. Han, and Y. Tang, “When embodied AI meets Industry 5.0: Human-centered smart manufacturing,” IEEE/CAA Journal of Automatica Sinica, vol. 12, no. 3, pp. 485–501, 2025. doi: 10.1109/JAS.2025.125327
[158]	W. Xiang, K. Yu, F. Han, L. Fang, D. He, and Q.-L. Han, “Advanced manufacturing in industry 5.0: A survey of key enabling technologies and future trends,” IEEE Trans. Industrial Informatics, vol. 20, no. 2, pp. 1055–1068, 2024. doi: 10.1109/TII.2023.3274224
[159]	Y. Ma, Z. Song, Y. Zhuang, J. Hao, and I. King, “A survey on vision-language-action models for embodied AI,” arXiv preprint arXiv: 2405.14093, 2024.
[160]	A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” arXiv preprint arXiv: 2312.00752, 2023.
[161]	Y. Liu, G. Li, and L. Lin, “Cross-modal causal relational reasoning for event-level visual question answering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 45, pp. 11624–11641, 2023.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(15) / Tables(2)

Get Citation

PDF

XML

Article Metrics

Article views (30) PDF downloads(9)

Embodied Multi-Agent Systems: A Review

doi: 10.1109/JAS.2025.125552

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content