Citation: | Z. Li, W. Wu, Y. Guo, J. Sun, and Q.-L. Han, “Embodied multi-agent systems: A review,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 6, pp. 1095–1116, Jun. 2025. doi: 10.1109/JAS.2025.125552 |
[1] |
S. Shalev-Shwartz, S. Shammah, and A. Shashua, “Safe, multi-agent, reinforcement learning for autonomous driving,” arXiv preprint arXiv: 1610.03295, 2016.
|
[2] |
M. Vinyals, J. A. Rodriguez-Aguilar, and J. Cerquides, “A survey on sensor networks from a multiagent perspective,” The Computer Journal, vol. 54, no. 3, pp. 455–470, 2011. doi: 10.1093/comjnl/bxq018
|
[3] |
Y. Rizk, M. Awad, and E. W. Tunstel, “Cooperative heterogeneous multi-robot systems: A survey,” ACM Computing Surveys (CSUR), vol. 52, no. 2, pp. 1–31, 2019.
|
[4] |
K. Zhang, Z. Yang, and T. Başar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,” Handbook of Reinforcement Learning and Control, pp. 321–384, 2021.
|
[5] |
R. Fjelland, “Why general artificial intelligence will not be realized,” Humanities and Social Sciences Communications, vol. 7, no. 1, pp. 1–9, 2020. doi: 10.1057/s41599-020-0492-6
|
[6] |
Y. Peng, J. Han, Z. Zhang, L. Fan, T. Liu, S. Qi, X. Feng, Y. Ma, Y. Wang, and S.-C. Zhu, “The tong test: Evaluating artificial general intelligence through dynamic embodied physical and social interactions,” Engineering, vol. 34, pp. 12–22, 2024. doi: 10.1016/j.eng.2023.07.006
|
[7] |
A. Cangelosi, J. Bongard, M. H. Fischer, and S. Nolfi, Embodied Intelligence. Berlin, Heidelberg: Springer Berlin Heidelberg, 2015, pp. 697–714.
|
[8] |
J. Duan, S. Yu, H. L. Tan, H. Zhu, and C. Tan, “A survey of embodied AI: From simulators to research tasks,” IEEE Trans. Emerging Topics in Computational Intelligence, vol. 6, no. 2, pp. 230–244, 2022.
|
[9] |
T. Guo, X. Chen, Y. Wang, R. Chang, S. Pei, N. V. Chawla, O. Wiest, and X. Zhang, “Large language model based multi-agents: A survey of progress and challenges,” arXiv preprint arXiv: 2402.01680, 2024.
|
[10] |
N. Roy, I. Posner, T. Barfoot, P. Beaudoin, Y. Bengio, J. Bohg, O. Brock, I. Depatie, D. Fox, D. Koditschek, T. Lozano-Perez, V. Mansinghka, C. Pal, B. Richards, D. Sadigh, S. Schaal, G. Sukhatme, D. Therien, M. Toussaint, and Michiel. Van de Panne, “From machine learning to robotics: Challenges and opportunities for embodied intelligence,” arXiv preprint arXiv: 2110.15245, 2021.
|
[11] |
W. Zhou, X. Zhu, Q.-L. Han, L. Li, X. Chen, S. Wen, and Y. Xiang, “The security of using large language models: A survey with emphasis on ChatGPT,” IEEE/CAA Journal of Automatica Sinica, vol. 12, no. 1, pp. 1–26, 2025. doi: 10.1109/JAS.2024.124983
|
[12] |
X. Zhu, W. Zhou, Q.-L. Han, W. Ma, S. Wen, and Y. Xiang, “When software security meets large language models: A survey,” IEEE/CAA Journal of Automatica Sinica, vol. 12, no. 2, pp. 317–334, 2025. doi: 10.1109/JAS.2024.124971
|
[13] |
Y. Liu, W. Chen, Y. Bai, J. Luo, X. Song, K. Jiang, Z. Li, G. Zhao, J. Lin, G. Li, W. Gao, L. Lin, “Aligning cyber space with physical world: A comprehensive survey on embodied ARTICLE,” arXiv preprint arXiv: 2407.06886, 2024.
|
[14] |
R. Brooks, “Intelligence without representation,” Artificial Intelligence, vol. 47, no. 1−3, pp. 139–159, 1991. doi: 10.1016/0004-3702(91)90053-M
|
[15] |
L. Smith and M. Gasser, “The development of embodied cognition: Six lessons from babies,” Artificial Life, vol. 11, no. 1−2, pp. 13–29, 2005. doi: 10.1162/1064546053278973
|
[16] |
G. Lakoff, M. Johnson, and J. F. Sowa, “Review of philosophy in the flesh: The embodied mind and its challenge to western thought,” Computational Linguistics, vol. 25, no. 4, pp. 631–634, 1999.
|
[17] |
S. K. Ramakrishnan, D. Jayaraman, and K. Grauman, “An exploration of embodied visual exploration,” Int. Journal of Computer Vision, vol. 129, no. 5, pp. 1616–1649, 2021. doi: 10.1007/s11263-021-01437-z
|
[18] |
G. D. Reynolds and K. C. Roth, “The development of attentional biases for faces in infancy: A developmental systems perspective,” Frontiers in Psychology, vol. 9, p. 222, 2018. doi: 10.3389/fpsyg.2018.00222
|
[19] |
A. Dorri, S. S. Kanhere, and R. Jurdak, “Multi-agent systems: A survey,” IEEE Access, vol. 6, pp. 28573–28593, 2018. doi: 10.1109/ACCESS.2018.2831228
|
[20] |
T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, “Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications,” IEEE Trans. Cybernetics, vol. 50, no. 9, pp. 3826–3839, 2020. doi: 10.1109/TCYB.2020.2977374
|
[21] |
G. F. Franklin, J. D. Powell, A. Emami-Naeini, and J. D. Powell, Feedback control of dynamic systems. Prentice Hall Upper Saddle River, 2002, vol. 4.
|
[22] |
N. R. Jennings, K. Sycara, and M. Wooldridge, “A roadmap of agent research and development,” Autonomous Agents and Multi-Agent Systems, vol. 1, pp. 7–38, 1998. doi: 10.1023/A:1010090405266
|
[23] |
C. Zhu, M. Dastani, and S. Wang, “A survey of multi-agent reinforcement learning with communication,” arXiv preprint arXiv: 2203.08975, 2022.
|
[24] |
P. Liu, Y. Zhou, D. Peng, and D. Wu, “Global-attention-based neural networks for vision language intelligence,” IEEE/CAA Journal of Automatica Sinica, vol. 8, no. 7, pp. 1243–1252, 2020.
|
[25] |
b. ichter, A. Brohan, Y. Chebotar, C. Finn, K. Hausman, A. Herzog, D. Ho, J. Ibarz, A. Irpan, E. Jang, R. Julian, D. Kalashnikov, S. Levine, Y. Lu, C. Parada, K. Rao, P. Sermanet, A. T. Toshev, V. Vanhoucke, F. Xia, T. Xiao, P. Xu, M. Yan, N. Brown, M. Ahn, O. Cortes, N. Sievers, C. Tan, S. Xu, D. Reyes, J. Rettinghouse, J. Quiambao, P. Pastor, L. Luu, K.-H. Lee, Y. Kuang, S. Jesmonth, N. J. Joshi, K. Jeffrey, R. J. Ruano, J. Hsu, K. Gopalakrishnan, B. David, A. Zeng, and C. K. Fu, “Do as I can, not as I say: Grounding language in robotic affordances,” in Proc. the 6th Conf. on Robot Learning, PMLR, vol. 205, 2023, pp. 287–318.
|
[26] |
T. Yoneda, J. Fang, P. Li, H. Zhang, T. Jiang, S. Lin, B. Picker, D. Yunis, H. Mei, and M. R. Walter, “Statler: State-maintaining language models for embodied reasoning,” arXiv preprint arXiv: 2306.17840, 2023.
|
[27] |
Y. Ze, G. Yan, Y.-H. Wu, A. Macaluso, Y. Ge, J. Ye, N. Hansen, L. E. Li, and X. Wang, “GNFactor: Multi-task real robot learning with generalizable neural feature fields,” in Conf. on Robot Learning. PMLR, vol. 229, pp. 284–301, 2023.
|
[28] |
W. Shen, G. Yang, A. Yu, J. Wong, L. P. Kaelbling, and P. Isola, “Distilled feature fields enable few-shot language-guided manipulation,” arXiv preprint arXiv: 2308.07931, 2023.
|
[29] |
C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard, “Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age,” IEEE Trans. Robotics, vol. 32, no. 6, pp. 1309–1332, 2016. doi: 10.1109/TRO.2016.2624754
|
[30] |
Z. Qian, J. Fu, and J. Xiao, “Towards accurate loop closure detection in semantic SLAM with 3D semantic covisibility graphs,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2455–2462, 2022. doi: 10.1109/LRA.2022.3145066
|
[31] |
H. Xu, P. Liu, X. Chen, and S. Shen, “D2SLAM: Decentralized and distributed collaborative visual-inertial SLAM system for aerial swarm,” IEEE Trans. Robotics, vol. 40, pp. 3445–3464, 2024. doi: 10.1109/TRO.2024.3422003
|
[32] |
Y. Tian, Y. Chang, F. Herrera Arias, C. Nieto-Granda, J. P. How, and L. Carlone, “Kimera-multi: Robust, distributed, dense metric-semantic SLAM for multi-robot systems,” IEEE Trans. Robotics, vol. 38, no. 4, pp. 2022–2038, 2022. doi: 10.1109/TRO.2021.3137751
|
[33] |
D. Shah, B. Osinski, B. Ichter, and S. Levine, “LM-Nav: Robotic navigation with large pre-trained models of language, vision, and action,” in Conf. on Robot Learning. PMLR, vol. 205, pp. 492−504, 2023.
|
[34] |
J. Zhang, K. Wang, R. Xu, G. Zhou, Y. Hong, X. Fang, Q. Wu, Z. Zhang, and H. Wang, “NaVid: Video-based VLM plans the next step for vision-and-language navigation,” arXiv preprint arXiv: 2402.15852, 2024.
|
[35] |
C. Huang, O. Mees, A. Zeng, and W. Burgard, “Audio visual language maps for robot navigation,” IEEE Int. Conf. on Robotics and Automation, 2023, pp.10608−10615.
|
[36] |
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of GO with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016. doi: 10.1038/nature16961
|
[37] |
S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in Int. Conf. on Machine Learning. PMLR, 2018, pp. 1587–1596.
|
[38] |
E. Kaufmann, L. Bauersfeld, A. Loquercio, M. Müller, V. Koltun, and D. Scaramuzza, “Champion-level drone racing using deep reinforcement learning,” Nature, vol. 620, no. 7976, pp. 982–987, 2023. doi: 10.1038/s41586-023-06419-4
|
[39] |
W. Huang, P. Abbeel, D. Pathak, and I. Mordatch, “Language models as zero-shot planners: Extracting actionable knowledge for embodied agents,” in Int. Conf. on Machine Learning. PMLR, 2022, pp. 9118–9147.
|
[40] |
J. Zhang, C. Bai, H. He, Z. Wang, B. Zhao, X. Li, and X. Li, “SAM-E: Leveraging visual foundation model with sequence imitation for embodied manipulation,” in Proc. of Machine Learning Research, vol. 235, pp. 58579−58598, 2024.
|
[41] |
X. Li, M. Liu, H. Zhang, C. Yu, J. Xu, H. Wu, C. Cheang, Y. Jing, W. Zhang, H. Liu, H. Li, and T. Kong, “Vision-language foundation models as effective robot imitators,” arXiv preprint arXiv: 2311.01378, 2024.
|
[42] |
Y. Tang, W. Yu, J. Tan, H. Zen, A. Faust, and T. Harada, “SayTap: Language to quadrupedal locomotion,” arXiv preprint arXiv: 2306.07580, 2023.
|
[43] |
Z. Zhu, X. Wang, W. Zhao, C. Min, N. Deng, M. Dou, Y. Wang, B. Shi, K. Wang, C. Zhang, Y. You, Z. Zhang, D. Zhao, L. Xiao, J. Zhao, J. Lu, and G. Huang, “Is sora a world simulator? A comprehensive survey on general world models and beyond,” arXiv preprint arXiv: 2405.03520, 2024.
|
[44] |
Z. Ding, A. Zhang, Y. Tian, and Q. Zheng, “Diffusion world model: Future modeling beyond step-by-step rollout for offline reinforcement learning,” arXiv preprint arXiv: 2402.03570, 2024.
|
[45] |
V. Micheli, E. Alonso, and F. Fleuret, “Transformers are sample-efficient world models,” in Int. Conf. on Learning Representations, pp. 1−21, 2023.
|
[46] |
M. Assran, Q. Duval, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat, Y. LeCun, and N. Ballas, “Self-supervised learning from images with a joint-embedding predictive architecture,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2023, pp. 15619–15629.
|
[47] |
A. Herzog, K. Rao, K. Hausman, Y. Lu, P. Wohlhart, M. Yan, J. Lin, M. G. Arenas, T. Xiao, D. Kappler, D. Ho, J. Rettinghouse, Y. Chebotar, K.-H. Lee, K. Gopalakrishnan, R. Julian, A. Li, C. Fu, B. Wei, S. Ramesh, K. Holden, K. Kleiven, D. Rendleman, S. Kirmani, J. Bingham, J. Weisz, Y. Xu, W. Lu, M. Bennice, C. Fong, D. Do, J. Lam, Y. Bai, B. Holson, M. Quinlan, N. Brown, M. Kalakrishnan, J. Ibarz, P. Pastor, and S. Levine, “Deep RL at scale: Sorting waste in office buildings with a fleet of mobile manipulators,” arXiv preprint arXiv: 2305.03270, 2023.
|
[48] |
I. Dasgupta, C. Kaeser-Chen, K. Marino, A. Ahuja, S. Babayan, F. Hill, and R. Fergus, “Collaborating with language models for embodied reasoning,” arXiv preprint arXiv: 2302.00763, 2023.
|
[49] |
Y. Hu, F. Lin, T. Zhang, L. Yi, and Y. Gao, “Look before you leap: Unveiling the power of GPT-4V in robotic vision-language planning,” arXiv preprint arXiv: 2311.17842, 2023.
|
[50] |
K. Li, J. Wang, L. Yang, C. Lu, and B. Dai, “Semgrasp: Semantic grasp generation via language aligned discretization,” arXiv preprint arXiv: 2404.03590, 2024.
|
[51] |
M. Shridhar, L. Manuelli, and D. Fox, “CLIPort: What and where pathways for robotic manipulation,” in Conf. on Robot Learning. PMLR, pp. 894−906, 2021.
|
[52] |
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, I. Sutskever, “Learning transferable visual models from natural language supervision,” in Conference on Machine Learning. PMLR, 2021, pp. 8748–8763.
|
[53] |
J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,” Science Robotics, vol. 4, no. 26, p. eaau5872, 2019. doi: 10.1126/scirobotics.aau5872
|
[54] |
J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,” Science Robotics, vol. 5, no. 47, p. eabc5986, 2020. doi: 10.1126/scirobotics.abc5986
|
[55] |
Z. Fu, X. Cheng, and D. Pathak, “Deep whole-body control: Learning a unified policy for manipulation and locomotion,” in Conf. on Robot Learning. PMLR, no.205, pp.138−149, 2022.
|
[56] |
G. Bhardwaj, S. Dasgupta, N. Sukavanam, and R. Balasubramanian, “Soft soil gait planning and control for biped robot using deep deterministic policy gradient approach,” arXiv preprint arXiv: 2306.08063, 2023.
|
[57] |
M. E. Moran, “Evolution of robotic arms,” Journal of robotic surgery, vol. 1, no. 2, pp. 103–111, 2007. doi: 10.1007/s11701-006-0002-x
|
[58] |
A. Billard and D. Kragic, “Trends and challenges in robot manipulation,” Science, vol. 364, no. 6446, p. eaat8414, 2019. doi: 10.1126/science.aat8414
|
[59] |
T. Luettel, M. Himmelsbach, and H.-J. Wuensche, “Autonomous ground vehicles—Concepts and a path to the future,” Proc. the IEEE, vol. 100, no. Special Centennial Issue, pp. 1831–1839, 2012.
|
[60] |
B. Siciliano, O. Khatib, and T. Kröger, Springer Handbook of Robotics. Springer, 2008, vol. 200.
|
[61] |
A. Ugenti, R. Galati, G. Mantriota, and G. Reina, “Analysis of an all-terrain tracked robot with innovative suspension system,” Mechanism and Machine Theory, vol. 182, p. 105237, 2023. doi: 10.1016/j.mechmachtheory.2023.105237
|
[62] |
S. A. H. Mohsan, M. A. Khan, F. Noor, I. Ullah, and M. H. Alsharif, “Towards the unmanned aerial vehicles (UAVs): A comprehensive review,” Drones, vol. 6, no. 6, p. 147, 2022. doi: 10.3390/drones6060147
|
[63] |
S. Shakoor, Z. Kaleem, M. I. Baig, O. Chughtai, and N. Long, “Role of UAVs in public safety communications: Energy efficiency perspective,” IEEE Access, vol. 7, no. 2019, p. 2019, 2019.
|
[64] |
T. Fukuda, P. Dario, and G.-Z. Yang, “Humanoid robotics—History, current state of the art, and challenges,” Science Robotics, vol. 2, no. 13, p. eaar4043, 2017. doi: 10.1126/scirobotics.aar4043
|
[65] |
Y. Tong, H. Liu, and Z. Zhang, “Advancements in humanoid robots: A comprehensive review and future prospects,” IEEE/CAA Journal of Automatica Sinica, vol. 11, no. 2, pp. 301–328, 2024. doi: 10.1109/JAS.2023.124140
|
[66] |
A. K. Vaskov, P. P. Vu, N. North, A. J. Davis, T. A. Kung, D. H. Gates, P. S. Cederna, and C. A. Chestek, “Surgically implanted electrodes enable real-time finger and grasp pattern recognition for prosthetic hands,” IEEE Trans. Robotics, vol. 38, no. 5, pp. 2841–2857, 2022. doi: 10.1109/TRO.2022.3170720
|
[67] |
Y. Fan, Z. Pei, C. Wang, M. Li, Z. Tang, and Q. Liu, “A review of quadruped robots: Structure, control, and autonomous motion,” Advanced Intelligent Systems, p. 2300783, 2024.
|
[68] |
M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V. Tsounis, J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch, R. Diethelm, S. Bachmann, A. Melzer, and M. Hoepflingers, “Anymal-a highly mobile and dynamic quadrupedal robot,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 38–44, 2016.
|
[69] |
G. Bellegarda, Y. Chen, Z. Liu, and Q. Nguyen, “Robust high-speed running for quadruped robots via deep reinforcement learning,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems. pp. 10364–10370, 2022.
|
[70] |
J. E. Bares and D. S. Wettergreen, “Dante II: Technical description, results, and lessons learned,” The Int. Journal of Robotics Research, vol. 18, no. 7, pp. 621–649, 1999. doi: 10.1177/02783649922066475
|
[71] |
X. Zhou, X. Wen, Z. Wang, Y. Gao, H. Li, Q. Wang, T. Yang, H. Lu, Y. Cao, C. Xu, and F. Gao, “Swarm of micro flying robots in the wild,” Science Robotics, vol. 7, no. 66, p. eabm5954, 2022. doi: 10.1126/scirobotics.abm5954
|
[72] |
L. P. Ramalepa and R. S. Jamisola Jr, “A review on cooperative robotic arms with mobile or drones bases,” Int. Journal of Automation and Computing, vol. 18, no. 4, pp. 536–555, 2021. doi: 10.1007/s11633-021-1299-7
|
[73] |
M. Orsag, C. Korpela, S. Bogdan, and P. Oh, “Dexterous aerial robots—mobile manipulation using unmanned aerial systems,” IEEE Trans. Robotics, vol. 33, no. 6, pp. 1453–1466, 2017. doi: 10.1109/TRO.2017.2750693
|
[74] |
S. Dasari, F. Ebert, S. Tian, S. Nair, B. Bucher, K. Schmeckpeper, S. Singh, S. Levine, and C. Finn, “Robonet: Large-scale multi-robot learning,” arXiv preprint arXiv: 1910.11215, 2019.
|
[75] |
O. X.-E. Collaboration, A. O'Neill, A. Rehman, A. Maddukuri, A. Gupta, A. Padalkar, A. Lee, A. Pooley, A. Gupta, A. Mandlekar, A. Jain, A. Tung, A. Bewley, A. Herzog, A. Irpan, A. Khazatsky, A. Rai, A. Gupta, A. Wang, A. Kolobov, A. Singh, A. Garg, A. Kembhavi, A. Xie, A. Brohan, A. Raffin, A. Sharma, A. Yavary, A. Jain, A. Balakrishna, A. Wahid, B. Burgess-Limerick, B. Kim, B. Schölkopf, B. Wulfe, B. Ichter, C. Lu, C. Xu, C. Le, C. Finn, C. Wang, C. Xu, C. Chi, C. Huang, C. Chan, C. Agia, C. Pan, C. Fu, C. Devin, D. Xu, D. Morton, D. Driess, D. Chen, D. Pathak, D. Shah, D. Büchler, D. Jayaraman, D. Kalashnikov, D. Sadigh, E. Johns, E. Foster, F. Liu, F. Ceola, F. Xia, F. Zhao, F. V. Frujeri, F. Stulp, G. Zhou, G. S. Sukhatme, G. Salhotra, G. Yan, G. Feng, G. Schiavi, G. Berseth, G. Kahn, G. Yang, G. Wang, H. Su, H.-S. Fang, H. Shi, H. Bao, H. B. Amor, H. I. Christensen, H. Furuta, H. Walke, H. Fang, H. Ha, I. Mordatch, I. Radosavovic, I. Leal, J. Liang, J. Abou-Chakra, J. Kim, J. Drake, J. Peters, J. Schneider, J. Hsu, J. Bohg, J. Bingham, J. Wu, J. Gao, J. Hu, J. Wu, J. Wu, J. Sun, J. Luo, J. Gu, J. Tan, J. Oh, J. Wu, J. Lu, J. Yang, J. Malik, J. Silvério, J. Hejna, J. Booher, J. Tompson, J. Yang, J. Salvador, J. J. Lim, J. Han, K. Wang, K. Rao, K. Pertsch, K. Hausman, K. Go, K. Gopalakrishnan, K. Goldberg, K. Byrne, K. Oslund, K. Kawaharazuka, K. Black, K. Lin, K. Zhang, K. Ehsani, K. Lekkala, K. Ellis, K. Rana, K. Srinivasan, K. Fang, K. P. Singh, K.-H. Zeng, K. Hatch, K. Hsu, L. Itti, L. Y. Chen, L. Pinto, L. Fei-Fei, L. Tan, L. J. Fan, L. Ott, L. Lee, L. Weihs, M. Chen, M. Lepert, M. Memmel, M. Tomizuka, M. Itkina, M. G. Castro, M. Spero, M. Du, M. Ahn, M. C. Yip, M. Zhang, M. Ding, M. Heo, M. K. Srirama, M. Sharma, M. J. Kim, N. Kanazawa, N. Hansen, N. Heess, N. J. Joshi, N. Suenderhauf, N. Liu, N. D. Palo, N. M. M. Shafiullah, O. Mees, O. Kroemer, O. Bastani, P. R. Sanketi, P. T. Miller, P. Yin, P. Wohlhart, P. Xu, P. D. Fagan, P. Mitrano, P. Sermanet, P. Abbeel, P. Sundaresan, Q. Chen, Q. Vuong, R. Rafailov, R. Tian, R. Doshi, R. Mart'in-Mart'in, R. Baijal, R. Scalise, R. Hendrix, R. Lin, R. Qian, R. Zhang, R. Mendonca, R. Shah, R. Hoque, R. Julian, S. Bustamante, S. Kirmani, S. Levine, S. Lin, S. Moore, S. Bahl, S. Dass, S. Sonawani, S. Song, S. Xu, S. Haldar, S. Karamcheti, S. Adebola, S. Guist, S. Nasiriany, S. Schaal, S. Welker, S. Tian, S. Ramamoorthy, S. Dasari, S. Belkhale, S. Park, S. Nair, S. Mirchandani, T. Osa, T. Gupta, T. Harada, T. Matsushima, T. Xiao, T. Kollar, T. Yu, T. Ding, T. Davchev, T. Z. Zhao, T. Armstrong, T. Darrell, T. Chung, V. Jain, V. Vanhoucke, W. Zhan, W. Zhou, W. Burgard, X. Chen, X. Chen, X. Wang, X. Zhu, X. Geng, X. Liu, X. Liangwei, X. Li, Y. Pang, Y. Lu, Y. J. Ma, Y. Kim, Y. Chebotar, Y. Zhou, Y. Zhu, Y. Wu, Y. Xu, Y. Wang, Y. Bisk, Y. Dou, Y. Cho, Y. Lee, Y. Cui, Y. Cao, Y.-H. Wu, Y. Tang, Y. Zhu, Y. Zhang, Y. Jiang, Y. Li, Y. Li, Y. Iwasawa, Y. Matsuo, Z. Ma, Z. Xu, Z. J. Cui, Z. Zhang, Z. Fu, and Z. Lin, “Open X-Embodiment: Robotic learning datasets and RT-X models,” in Int. Conf. on Robotics and Automation. IEEE, 2024, pp. 6892−6903.
|
[76] |
A. Khazatsky, K. Pertsch, S. Nair, A. Balakrishna, S. Dasari, S. Karamcheti, S. Nasiriany, M. K. Srirama, L. Y. Chen, K. Ellis, P. D. Fagan, J. Hejna, M. Itkina, M. Lepert, Y. J. Ma, P. T. Miller, J. Wu, S. Belkhale, S. Dass, H. Ha, A. Jain, A. Lee, Y. Lee, M. Memmel, S. Park, I. Radosavovic, K. Wang, A. Zhan, K. Black, C. Chi, K. B. Hatch, S. Lin, J. Lu, J. Mercat, A. Rehman, P. R. Sanketi, A. Sharma, C. Simpson, Q. Vuong, H. R. Walke, B. Wulfe, T. Xiao, J. H. Yang, A. Yavary, T. Z. Zhao, C. Agia, R. Baijal, M. G. Castro, D. Chen, Q. Chen, T. Chung, J. Drake, E. P. Foster, J. Gao, V. Guizilini, D. A. Herrera, M. Heo, K. Hsu, J. Hu, M. Z. Irshad, D. Jackson, C. Le, Y. Li, K. Lin, R. Lin, Z. Ma, A. Maddukuri, S. Mirchandani, D. Morton, T. Nguyen, A. O’Neill, R. Scalise, D. Seale, V. Son, S. Tian, E. Tran, A. E. Wang, Y. Wu, A. Xie, J. Yang, P. Yin, Y. Zhang, O. Bastani, G. Berseth, J. Bohg, K. Goldberg, A. Gupta, A. Gupta, D. Jayaraman, J. J. Lim, J. Malik, R. Martín-Martín, S. Ramamoorthy, D. Sadigh, S. Song, J. Wu, M. C. Yip, Y. Zhu, T. Kollar, S. Levine, and C. Finn, “Droid: A large-scale in-the-wild robot manipulation dataset,” arXiv preprint arXiv: 2403.12945, 2024.
|
[77] |
C. Chi, Z. Xu, C. Pan, E. Cousineau, B. Burchfiel, S. Feng, R. Tedrake, and S. Song, “Universal manipulation interface: In-the-wild robot teaching without in-the-wild robots,” arXiv preprint arXiv: 2402.10329, 2024.
|
[78] |
Z. Fu, T. Z. Zhao, and C. Finn, “Mobile ALOHA: Learning bimanual mobile manipulation with low-cost whole-body teleoperation,” arXiv preprint arXiv: 2401.02117, 2024.
|
[79] |
S. Luo, Q. Peng, J. Lv, K. Hong, K. R. Driggs-Campbell, C. Lu, and Y.-L. Li, “Human-agent joint learning for efficient robot manipulation skill acquisition,” arXiv preprint arXiv: 2407.00299, 2024.
|
[80] |
E. Kolve, R. Mottaghi, W. Han, E. VanderBilt, L. Weihs, A. Herrasti, M. Deitke, K. Ehsani, D. Gordon, Y. Zhu, A. Kembhavi, A. Gupta, and A. Farhadi, “AI2-Thor: An interactive 3D environment for visual AI,” arXiv preprint arXiv: 1712.05474, 2017.
|
[81] |
X. Puig, E. Undersander, A. Szot, M. D. Côte, R. Partsey, J. Yang, R. Desai, A. W. Clegg, M. Hlavač, T. Min, T. Gervet, V. Vondruš, V.-P. Berges, J. Turner, O. Maksymets, Z. Kira, M. Kalakrishnan, J. Malik, D. S. Chaplot, U. Jain, D. Batra, A. Rai, and R. Mottaghi, “Habitat 3.0: A co-habitat for humans, avatars and robots,” arXiv preprint arXiv: 2310.13724, 2023.
|
[82] |
F. Xia, A. R. Zamir, Z.-Y. He, A. Sax, J. Malik, and S. Savarese, “Gibson Env: Real-world perception for embodied agents,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition. IEEE, pp. 9068−9079, 2018.
|
[83] |
A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proc. Annual Conf. on Robot Learning, 2017, pp. 1–16.
|
[84] |
E. Todorov, T. Erez, and Y. Tassa, “MuJoCo: A physics engine for model-based control,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems. IEEE, 2012, pp. 5026–5033.
|
[85] |
Y. Wang, Z. Xian, F. Chen, T.-H. Wang, Y. Wang, K. Fragkiadaki, Z. Erickson, D. Held, and C. Gan, “Robogen: Towards unleashing infinite data for automated robot learning via generative simulation,” arXiv preprint arXiv: 2311.01455, 2024.
|
[86] |
L. Wang, Y. Ling, Z. Yuan, M. Shridhar, C. Bao, Y. Qin, B. Wang, H. Xu, and X. Wang, “GenSim: Generating robotic simulation tasks via large language models,” arXiv preprint arXiv: 2310.01361, 2024.
|
[87] |
Y. Du, M. Yang, B. Dai, H. Dai, O. Nachum, J. B. Tenenbaum, D. Schuurmans, and P. Abbeel, “Learning universal policies via text-guided video generation,” in Conf. on Neural Information Processing Systems, pp.9156−9172, 2023.
|
[88] |
Z. Wan, Y. Xie, C. Zhang, Z. Lin, Z. Wang, S. Stepputtis, D. Ramanan, and K. P. Sycara, “Instructpart: Affordance-based part segmentation from language instruction,” in AAAI-2024 Workshop on Public Sector LLMs: Algorithmic and Sociotechnical Design, 2024.
|
[89] |
T. Gupta, W. Gong, C. Ma, N. Pawlowski, A. Hilmkil, M. Scetbon, A. Famoti, A. J. Llorens, J. Gao, S. Bauer, D. Kragic, B. Schölkopf, and C. Zhang, “The essential role of causality in foundation world models for embodied AI,” arXiv preprint arXiv: 2402.06665, 2024.
|
[90] |
M. Kaspar, J. D. M. Osorio, and J. Bock, “Sim2real transfer for reinforcement learning without dynamics randomization,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2020, pp. 4383–4388.
|
[91] |
B. Planche, Z. Wu, K. Ma, S. Sun, S. Kluckner, T. Chen, A. Hutter, S. Zakharov, H. Kosch, and J. Ernst, “DepthSynth: Real-time realistic synthetic data generation from CAD models for 2.5D recognition,” in Int. Conf. on 3D Vision, pp. 1−10, 2017.
|
[92] |
J. Truong, S. Chernova, and D. Batra, “Bi-directional domain adaptation for sim2real transfer of embodied navigation agents,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2634–2641, 2021. doi: 10.1109/LRA.2021.3062303
|
[93] |
J. Shi, Y. Jin, D. Li, H. Niu, Z. Jin, H. Wang, “ASGrasp: Generalizable transparent object reconstruction and grasping from RGB-D active stereo camera,” arXiv preprint arXiv: 2405.05648, 2024.
|
[94] |
Y. Mu, Q. Zhang, M. Hu, W. Wang, M. Ding, J. Jin, B. Wang, J. Dai, Y. Qiao, and P. Luo, “EmbodiedGPT: Vision-language pre-training via embodied chain of thought,” Advances in Neural Information Processing Systems, vol. 36, 2024.
|
[95] |
D. Kalashnikov, J. Varley, Y. Chebotar, B. Swanson, R. Jonschkowski, C. Finn, S. Levine, and K. Hausman, “Mt-opt: Continuous multi-task robotic reinforcement learning at scale,” arXiv preprint arXiv: 2104.08212, 2021.
|
[96] |
J. Long, Z. Wang, Q. Li, J. Gao, L. Cao, and J. Pang, “Hybrid internal model: A simple and efficient learner for agile legged locomotion,” arXiv preprint arXiv: 2312.11460, 2023.
|
[97] |
G. Cao, Y. Zhou, D. Bollegala, and S. Luo, “Spatio-temporal attention model for tactile texture recognition,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, 2020, pp. 9896–9902.
|
[98] |
C. Wu, J. Chen, Q. Cao, J. Zhang, Y. Tai, L. Sun, and K. Jia, “Grasp proposal networks: An end-to-end solution for visual learning of robotic grasps,” Advances in Neural Information Processing Systems, vol. 33, pp. 13174–13184, 2020.
|
[99] |
S. Ainetter and F. Fraundorfer, “End-to-end trainable deep neural network for robotic grasp detection and semantic segmentation from rgb,” in Int. Conf. on Robotics and Automation. IEEE, 2021, pp. 13452–13458.
|
[100] |
M. Shridhar, L. Manuelli, and D. Fox, “Cliport: What and where pathways for robotic manipulation,” in Conf. on Robot Learning. PMLR, 2022, pp. 894–906.
|
[101] |
Z. Qi, R. Dong, S. Zhang, H. Geng, C. Han, Z. Ge, H. Wang, L. Yi, and K. Ma, “Shapellm: Universal 3D object understanding for embodied interaction,” arXiv preprint arXiv: 2402.17766, 2024.
|
[102] |
L. Annamalai and C. S. Thakur, “EventASEG: An event-based asynchronous segmentation of road with likelihood attention,” IEEE Robotics and Automation Letters, vol.9, no.8, pp. 6951−6928, 2024.
|
[103] |
T. Carta, C. Romac, T. Wolf, S. Lamprier, O. Sigaud, and P.-Y. Oudeyer, “Grounding large language models in interactive environments with online reinforcement learning,” in Int. Conf. on Machine Learning. PMLR, 2023, pp. 3676–3713.
|
[104] |
H. Zhang, W. Du, J. Shan, Q. Zhou, Y. Du, J. B. Tenenbaum, T. Shu, and C. Gan, “Building cooperative embodied agents modularly with large language models,” in Int. Conf. on Learning Representations, 2024.
|
[105] |
W. Huang, F. Xia, T. Xiao, H. Chan, J. Liang, P. Florence, A. Zeng, J. Tompson, I. Mordatch, Y. Chebotar, P. Sermanet, N. Brown, T. Jackson, L. Luu, S. Levine, K. Hausman, and B. Ichter, “Inner monologue: Embodied reasoning through planning with language models,” in Annual Conf. on Robot Learning, PMLR, no. 205, pp. 1769−1782, 2022.
|
[106] |
Y. Long, X. Li, W. Cai, and H. Dong, “Discuss before moving: Visual language navigation via multi-expert discussions,” in IEEE Int. Conf. on Robotics and Automation, pp. 4842−4849, 2024.
|
[107] |
Y. Ding, H. Geng, C. Xu, X. Fang, J. Zhang, S. Wei, Q. Dai, Z. Zhang, and H. Wang, “Open6DOR: Benchmarking open-instruction 6-DoF object rearrangement and a VLM-based approach,” in IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp.7359−7366, 2024.
|
[108] |
W. Huang, C. Wang, R. Zhang, Y. Li, J. Wu, and F.-F. Li, “VoxPoser: Composable 3D value maps for robotic manipulation with language models,” arXiv preprint arXiv: 2307.05973, 2023.
|
[109] |
H. Zhang, J. Xu, Y. Mo, and T. Kong, “InViG: Open-ended interactive visual grounding in human-robot interaction with 500k dialogues,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition Workshops, pp. 5508−5518, 2024.
|
[110] |
Z. Chen, J. Wu, W. Wang, W. Su, G. Chen, S. Xing, M. Zhong, Q. Zhang, X. Zhu, L. Lu, B. Li, P. Luo, T. Lu, Y. Qiao, and J. Dai, “InternVL: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,” arXiv preprint arXiv: 2312.14238, 2023.
|
[111] |
R. Dong, C. Han, Y. Peng, Z. Qi, Z. Ge, J. Yang, L. Zhao, J. Sun, H. Zhou, H. Wei, X. Kong, X. Zhang, K. Ma, and L. Yi, “DreamLLM: Synergistic multimodal comprehension and creation,” in Int. Conf. on Learning Representations, 2024.
|
[112] |
M. Ahn, D. Dwibedi, C. Finn, M. G. Arenas, K. Gopalakrishnan, K. Hausman, B. Ichter, A. Irpan, N. Joshi, R. Julian, S. Kirmani, I. Leal, E. Lee, S. Levine, Y. Lu, I. Leal, S. Maddineni, K. Rao, D. Sadigh, P. Sanketi, P. Sermanet, Q. Vuong, S. Welker, F. Xia, T. Xiao, P. Xu, S. Xu, and Z. Xu, “AutoRT: Embodied foundation models for large scale orchestration of robotic agents,” arXiv preprint arXiv: 2401.12963, 2024.
|
[113] |
S. Gao, Y. Dai, and A. Nathan, “Tactile and vision perception for intelligent humanoids,” Advanced Intelligent Systems, vol. 4, no. 2, p. 2100074, 2022. doi: 10.1002/aisy.202100074
|
[114] |
H. Liu, D. Guo, F. Sun, W. Yang, S. Furber, and T. Sun, “Embodied tactile perception and learning,” Brain Science Advances, vol. 6, no. 2, pp. 132–158, 2020. doi: 10.26599/BSA.2020.9050012
|
[115] |
H. Li and L. Cheng, “A systematic review on hand exoskeletons from the mechatronics aspect,” IEEE/ASME Trans. Mechatronics, pp. 1−19, 2024.
|
[116] |
W. Yuan, Y. Mo, S. Wang, and E. H. Adelson, “Active clothing material perception using tactile sensing and deep learning,” in IEEE Int. Conf. on Robotics and Automation, 2018, pp. 4842–4849.
|
[117] |
Z. Li, L. Cheng, Z. Liu, J. Wei, and Y. Wang, “Focers: An ultrasensitive and robust soft optical 3D tactile sensor,” Soft Robotics, 2025.
|
[118] |
J. Yang, X. Chen, S. Qian, N. Madaan, M. Iyengar, D. F. Fouhey, and J. Chai, “LLM-Grounder: Open-vocabulary 3D visual grounding with large language model as an agent,” in Int. Conf. on Robotics and Automation, 2024, pp. 7694–7701.
|
[119] |
Z. Li, X. Wu, H. Du, H. Nghiem and G. Shi, “Benchmark evaluations, applications, and challenges of large vision language models: A survey,” arXiv preprint arXiv:2501.02189, 2025.
|
[120] |
G. Shi, W. Hönig, X. Shi, Y. Yue, and S.-J. Chung, “Neural-swarm2: Planning and control of heterogeneous multirotor swarms using learned interactions,” IEEE Trans. Robotics, vol. 38, no. 2, pp. 1063–1079, 2021.
|
[121] |
A. Gupta, S. Savarese, S. Ganguli, and L. Fei-Fei, “Embodied intelligence via learning and evolution,” Nature communications, vol. 12, no. 1, p. 5721, 2021. doi: 10.1038/s41467-021-25874-z
|
[122] |
Y. Hu, J. Yang, L. Chen, K. Li, C. Sima, X. Zhu, S. Chai, S. Du, T. Lin, W. Wang, L. Lu, X. Jia, Q. Liu, J. Dai, Y. Qiao, and H. Li, “Planning-oriented autonomous driving,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp.17853−17862, 2023.
|
[123] |
K. L. Downing, “Neuroscientific implications for situated and embodied artificial intelligence,” Connection Science, vol. 19, no. 1, pp. 75–104, 2007. doi: 10.1080/09540090701192584
|
[124] |
J. Chen, J. Li, Y. Huang, C. Garrett, D. Sun, C. Fan, A. Hofmann, C. Mueller, S. Koenig, and B. C. Williams, “Cooperative task and motion planning for multi-arm assembly systems,” arXiv preprint arXiv: 2203.02475, 2022.
|
[125] |
X. Liu, X. Li, D. Guo, S. Tan, H. Liu, and F. Sun, “Embodied multi-agent task planning from ambiguous instruction.” in Robotics: Science and Systems, pp. 1−14, 2022.
|
[126] |
X. Liu, D. Guo, H. Liu, and F. Sun, “Multi-agent embodied visual semantic navigation with scene prior knowledge,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3154–3161, 2022. doi: 10.1109/LRA.2022.3145964
|
[127] |
S. Sukhbaatar, A. Szlam, and R. Fergus, “Learning multiagent communication with backpropagation,” Advances in Neural Information Processing Systems, vol. 29, 2016.
|
[128] |
Y. Hoshen, “Vain: Attentional multi-agent predictive modeling,” Advances in Neural Information Processing Systems, vol. 30, pp. 6283−6292, 2017.
|
[129] |
J. Jiang and Z. Lu, “Learning attentional communication for multi-agent cooperation,” Advances in Neural Information Processing Systems, vol. 31, pp. 646−654, 2018.
|
[130] |
A. Das, T. Gervet, J. Romoff, D. Batra, D. Parikh, M. Rabbat, and J. Pineau, “TarMAC: Targeted multi-agent communication,” in Int. Conf. on Machine Learning, PMLR, vol. 97, 2019, pp. 1538–1546.
|
[131] |
Y.-C. Liu, J. Tian, N. Glaser, and Z. Kira, “When2com: Multi-agent perception via communication graph grouping,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2020.
|
[132] |
Y.-C. Liu, J. Tian, C.-Y. Ma, N. Glaser, C.-W. Kuo, and Z. Kira, “Who2com: Collaborative perception via learnable handshake communication,” in Int. Conf. on Robotics and Automation. IEEE, 2020, pp. 6876–6883.
|
[133] |
Y. Hu, S. Fang, Z. Lei, Y. Zhong, and S. Chen, “Where2comm: Communication-efficient collaborative perception via spatial confidence maps,” Advances in Neural Information Processing Systems, vol. 35, pp. 4874–4886, 2022.
|
[134] |
K. Yang, D. Yang, J. Zhang, H. Wang, P. Sun, and L. Song, “What2comm: Towards communication-efficient collaborative perception via feature decoupling,” in ACM Int. Conf. on Multimedia, 2023, pp. 7686–7695.
|
[135] |
D. Yang, K. Yang, Y. Wang, J. Liu, Z. Xu, R. Yin, P. Zhai, and L. Zhang, “How2comm: Communication-efficient and collaboration-pragmatic multi-agent perception,” Advances in Neural Information Processing Systems, vol. 36, 2024.
|
[136] |
X. Liu, D. Guo, X. Zhang, and H. Liu, “Heterogeneous embodied multi-agent collaboration,” IEEE Robotics and Automation Letters, vol. 9, no. 2, pp. 5377−5384, 2024.
|
[137] |
V. Sharma, P. Goyal, K. Lin, G. Thattai, Q. Gao, and G. S. Sukhatme, “Ch-MARL: A multimodal benchmark for cooperative, heterogeneous multi-agent reinforcement learning,” arXiv preprint arXiv: 2208.13626, 2022.
|
[138] |
R. Gong, X. Gao, Q. Gao, S. Shakiah, G. Thattai, and G. S. Sukhatme, “Lemma: Learning language-conditioned multi-robot manipulation,” IEEE Robotics and Automation Letters, vol. 8, no. 10, pp. 6835−6842, 2023.
|
[139] |
S. Patel, S. Wani, U. Jain, A. G. Schwing, S. Lazebnik, M. Savva, and A. X. Chang, “Interpretation of emergent communication in heterogeneous collaborative embodied agents,” in Proc. the IEEE/CVF Int. Conf. on Computer Vision, 2021, pp. 15953–15963.
|
[140] |
Y. Du, S. Li, A. Torralba, J. B. Tenenbaum, and I. Mordatch, “Improving factuality and reasoning in language models through multiagent debate,” arXiv preprint arXiv: 2305.14325, 2023.
|
[141] |
R. Galin and R. Meshcheryakov, “Review on human-robot interaction during collaboration in a shared workspace,” in Interactive Collaborative Robotics: 4th Int. Conf., Istanbul, Turkey. Springer, 2019, pp. 63–74.
|
[142] |
T. B. Sheridan, “Human–robot interaction: Status and challenges,” Human factors, vol. 58, no. 4, pp. 525–532, 2016. doi: 10.1177/0018720816644364
|
[143] |
A. Das, S. Datta, G. Gkioxari, S. Lee, D. Parikh, and D. Batra, “Embodied question answering,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition Workshops, pp. 2054−2063, 2017.
|
[144] |
S. Tan, M. Ge, D. Guo, H. Liu, and F. Sun, “Knowledge-based embodied question answering,” IEEE Robotics and Automation Letters, vol. 6, no. 3, pp. 4749−4756, 2021.
|
[145] |
L. Yu, X. Chen, G. Gkioxari, M. Bansal, T. L. Berg, and D. Batra, “Multi-target embodied question answering,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 6309−6318, 2019.
|
[146] |
U. Jain, L. Weihs, E. Kolve, A. Farhadi, S. Lazebnik, A. Kembhavi, and A. Schwing, “A cordial sync: Going beyond marginal policies for multi-agent embodied tasks,” in European Conf. on Computer Vision, 2020, pp. 471–490.
|
[147] |
U. Jain, L. Weihs, E. Kolve, M. Rastegari, S. Lazebnik, A. Farhadi, A. G. Schwing, and A. Kembhavi, “Two body problem: Collaborative visual task completion,” in Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2019, pp. 6689–6699.
|
[148] |
A. Szot, U. Jain, D. Batra, Z. Kira, R. Desai, and A. Rai, “Adaptive coordination in social embodied rearrangement,” in Int. Conf. on Machine Learning. PMLR, 2023, pp. 33365–33380.
|
[149] |
Y. Liu, X. Cao, T. Chen, Y. Jiang, J. You, M. Wu, X. Wang, M. Feng, Y. Jin, and J. Chen, “From screens to scenes: A survey of embodied AI in healthcare,” arXiv preprint arXiv: 2501.07468, 2025.
|
[150] |
S. E. Salcudean, H. Moradi, D. G. Black, and N. Navab, “Robot-assisted medical imaging: A review,” Proc. the IEEE, vol. 110, no. 7, pp. 951–967, 2022. doi: 10.1109/JPROC.2022.3162840
|
[151] |
T. J. Loftus, M. S. Altieri, J. A. Balch, K. L. Abbott, J. Choi, J. S. Marwaha, D. A. Hashimoto, G. A. Brat, Y. Raftopoulos, H. L. Evans, G. P. Jackson, D. S. Walsh, C. J. Tignanelli, “Artificial intelligence–enabled decision support in surgery: State-of-the-art and future directions,” Annals of Surgery, vol. 278, no. 1, pp. 51–58, 2023. doi: 10.1097/SLA.0000000000005853
|
[152] |
S. Jana, G. F. Italiano, M. J. Kashyop, A. L. Konstantinidis, E. Kosinas, and P. S. Mandal, “Online drone scheduling for last-mile delivery,” in Int. Colloquium on Structural Information and Communication Complexity. Springer, 2024, pp. 488–493.
|
[153] |
N. Cherif, W. Jaafar, H. Yanikomeroglu, and A. Yongacoglu, “RL-based cargo-UAV trajectory planning and cell association for minimum handoffs, disconnectivity, and energy consumption,” IEEE Trans. on Vehicular Technology, vol. 73, no. 5, pp. 7304−7309, 2023.
|
[154] |
R. Bokade, X. Jin, and C. Amato, “Multi-agent reinforcement learning based on representational communication for large-scale traffic signal control,” IEEE Access, vol. 11, pp. 47646–47658, 2023. doi: 10.1109/ACCESS.2023.3275883
|
[155] |
H. Zhang, S.-H. Chan, J. Zhong, J. Li, P. Kolapo, S. Koenig, Z. Agioutantis, S. Schafrik, and S. Nikolaidis, “Multi-robot geometric task-and-motion planning for collaborative manipulation tasks,” Autonomous Robots, vol. 47, no. 8, pp. 1537–1558, 2023. doi: 10.1007/s10514-023-10148-y
|
[156] |
A. Fenoy, J. Zagoli, F. Bistaffa, and A. Farinelli, “Attention for the allocation of tasks in multi-agent pickup and delivery,” in Proc. the 39th ACM/SIGAPP Symposium on Applied Computing, 2024, pp. 622–629.
|
[157] |
J. Xu, Q. Sun, Q.-L. Han, and Y. Tang, “When embodied AI meets Industry 5.0: Human-centered smart manufacturing,” IEEE/CAA Journal of Automatica Sinica, vol. 12, no. 3, pp. 485–501, 2025. doi: 10.1109/JAS.2025.125327
|
[158] |
W. Xiang, K. Yu, F. Han, L. Fang, D. He, and Q.-L. Han, “Advanced manufacturing in industry 5.0: A survey of key enabling technologies and future trends,” IEEE Trans. Industrial Informatics, vol. 20, no. 2, pp. 1055–1068, 2024. doi: 10.1109/TII.2023.3274224
|
[159] |
Y. Ma, Z. Song, Y. Zhuang, J. Hao, and I. King, “A survey on vision-language-action models for embodied AI,” arXiv preprint arXiv: 2405.14093, 2024.
|
[160] |
A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with selective state spaces,” arXiv preprint arXiv: 2312.00752, 2023.
|
[161] |
Y. Liu, G. Li, and L. Lin, “Cross-modal causal relational reasoning for event-level visual question answering,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 45, pp. 11624–11641, 2023.
|