A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 13 Issue 4
Apr.  2026

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 19.2, Top 1 (SCI Q1)
    CiteScore: 28.2, Top 1% (Q1)
    Google Scholar h5-index: 95, TOP 5
Turn off MathJax
Article Contents
W. Ma, Y. Guo, Q.-L. Han, W. Zhou, X. Zhu, J. Xiong, S. Wen, and Y. Xiang, IEEE/CAA J. Autom. Sinica, vol. 13, no. 4, pp. 776–795, Apr. 2026. doi: 10.1109/JAS.2026.125993
Citation: W. Ma, Y. Guo, Q.-L. Han, W. Zhou, X. Zhu, J. Xiong, S. Wen, and Y. Xiang, IEEE/CAA J. Autom. Sinica, vol. 13, no. 4, pp. 776–795, Apr. 2026. doi: 10.1109/JAS.2026.125993

Understanding Agentic AI: Algorithms and Infrastructure

doi: 10.1109/JAS.2026.125993
More Information
  • The rapid evolution of large language models (LLMs) towards autonomous Agentic artificial intelligence (AI) necessitates a systemic overhaul across algorithms, infrastructure, and architectures. This paper presents a unified view of the “Agentic AI Infrastructure,” connecting research threads often studied in isolation. First, post-training algorithms are reviewed, contrasting traditional reinforcement learning (RL) with emerging reasoning-centric methods and test-time scaling strategies. Next, the transition of RL training frameworks is analyzed from monolithic, colocated designs to disaggregated, asynchronous architectures tailored for the extreme variance of agentic rollouts. Furthermore, progress in agent construction is synthesized, covering reflection, planning, tool use, and multi-agent collaboration. By integrating these layers, the paper elucidates how agentic AI systems impose unique demands on underlying training systems. Finally, open challenges are outlined by covering capability scaling, efficiency, safety, privacy, and governance for reliable real-world agentic AI deployment.

     

  • loading
  • [1]
    Z. Wang, S. Cai, G. Chen, A. Liu, X. Ma, Y. Liang, and T. CraftJarvis, “Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 1480.
    [2]
    X. Li, S. Wang, S. Zeng, Y. Wu, and Y. Yang, “A survey on LLM-based multi-agent systems: Workflow, infrastructure, and challenges,” Vicinagearth, vol. 1, no. 1, Art. no. 9, Oct. 2024. doi: 10.1007/s44336-024-00009-2
    [3]
    J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” in Proc. 36th Annu. ACM Symp. on User Interface Software and Technology, San Francisco, USA, 2023, Art. no. 2.
    [4]
    X. Zhu, Y. Chen, H. Tian, C. Tao, W. Su, C. Yang, et al., “Ghost in the Minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory,” arXiv preprint arXiv: 2305.17144, 2023.
    [5]
    G. Wang, Y. Xie, Y. Jiang, A. Mandlekar, C. Xiao, Y. Zhu, L. Fan, and A. Anandkumar, “Voyager: An open-ended embodied agent with large language models,” Trans. Mach. Learn. Res., Mar. 2024.
    [6]
    C. Qu, S. Dai, X. Wei, H. Cai, S. Wang, D. Yin, J. Xu, and J.-R. Wen, “Tool learning with large language models: A survey,” Front. Comput. Sci., vol. 19, no. 8, Art. no. 198343, Jan. 2025. doi: 10.1007/s11704-024-40678-2
    [7]
    T. Schick, J. Dwivedi-Yu, R. Dessí, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 2997.
    [8]
    C. Qian, W. Liu, H. Liu, N. Chen, Y. Dang, J. Li, et al., “ChatDev: Communicative agents for software development,” in Proc. 62nd Annu. Meet. of the Association for Computational Linguistics, Bangkok, Thailand, 2024, pp. 15174−15186.
    [9]
    Z. Li, W. Wu, Y. Guo, J. Sun, and Q.-L. Han, “Embodied multi-agent systems: A review,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 6, pp. 1095–1116, Jun. 2025. doi: 10.1109/JAS.2025.125552
    [10]
    W. Zhou, X. Zhu, Q.-L. Han, L. Li, X. Chen, S. Wen, and Y. Xiang, “The security of using large language models: A survey with emphasis on ChatGPT,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 1, pp. 1–26, Nov. 2025.
    [11]
    J. Lin, X. Zeng, J. Zhu, S. Wang, J. Shun, J. Wu, and D. Zhou, “Plan and budget: Effective and efficient test-time scaling on reasoning large language models,” arXiv preprint arXiv: 2505.16122, 2025.
    [12]
    Q. Zhang, F. Lyu, Z. Sun, L. Wang, W. Zhang, W. Hua, et al., “A survey on test-time scaling in large language models: What, how, where, and how well?” arXiv preprint arXiv: 2503.24235, 2025.
    [13]
    Y. Zhong, Z. Zhang, X. Song, H. Hu, C. Jin, B. Wu, et al., “StreamRL: Scalable, heterogeneous, and elastic RL for LLMs with disaggregated stream generation,” arXiv preprint arXiv: 2504.15930, 2025.
    [14]
    G. Sheng, C. Zhang, Z. Ye, X. Wu, W. Zhang, R. Zhang, Y. Peng, H. Lin, and C. Wu, “HybridFlow: A flexible and efficient RLHF framework,” in Proc. Twentieth European Conf. Computer Systems, Rotterdam, Netherlands, 2025, pp. 1279−1297.
    [15]
    G. Sheng, Y. Tong, B. Wan, W. Zhang, C. Jia, X. Wu, et al., “Laminar: A scalable asynchronous RL post-training framework,” arXiv preprint arXiv: 2510.12633, 2025.
    [16]
    W. Wang, S. Xiong, G. Chen, W. Gao, S. Guo, Y. He, et al., “Reinforcement learning optimization for large-scale learning: An efficient and user-friendly scaling library,” arXiv preprint arXiv: 2506.06122, 2025.
    [17]
    R. Kamoi, Y. Zhang, N. Zhang, J. Han, and R. Zhang, “When can LLMs actually correct their own mistakes? A critical survey of self-correction of LLMs,” Trans. Assoc. Comput. Linguist., vol. 12, pp. 1417–1440, Nov. 2024. doi: 10.1162/tacl_a_00713
    [18]
    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv: 1707.06347, 2017.
    [19]
    M. Luo, J. Yao, R. Liaw, E. Liang, and I. Stoica, “IMPACT: Importance weighted asynchronous architectures with clipped target networks,” in Proc. 8th Int. Conf. Learning Representations, Addis Ababa, Ethiopia, 2020.
    [20]
    Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, et al., “DeepSeekMath: Pushing the limits of mathematical reasoning in open language models,” arXiv preprint arXiv: 2402.03300, 2024.
    [21]
    Q. Yu, Z. Zhang, R. Zhu, Y. Yuan, X. Zuo, Y. Yue, et al., “DAPO: An open-source LLM reinforcement learning system at scale,” arXiv preprint arXiv: 2503.14476, 2025.
    [22]
    J. Liu, J. Obando-Ceron, H. Lu, Y. He, W. Wang, W. Su, B. Zheng, P. S. Castro, A. Courville, and L. Pan, “Asymmetric proximal policy optimization: Mini-critics boost LLM reasoning,” arXiv preprint arXiv: 2510.01656, 2025.
    [23]
    H. Wang, S. Hao, H. Dong, S. Zhang, Y. Bao, Z. Yang, and Y. Wu, “Offline reinforcement learning for LLM multi-step reasoning,” in Proc. Findings of the Association for Computational Linguistics: ACL 2025, Vienna, Austria, 2025, pp. 8881−8893.
    [24]
    R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 2338.
    [25]
    M. G. Azar, Z. D. Guo, B. Piot, R. Munos, M. Rowland, M. Valko, and D. Calandriello, “A general theoretical paradigm to understand learning from human preferences,” in Proc. 27th Int. Conf. Artificial Intelligence and Statistics, Valencia, Spain, 2024, pp. 4447−4455.
    [26]
    K. Ethayarajh, W. Xu, N. Muennighoff, D. Jurafsky, and D. Kiela, “Model alignment as prospect theoretic optimization,” in Proc. 41st Int. Conf. Machine Learning, Vienna, Austria, 2024, pp. 12634−12651.
    [27]
    H. Yuan, Z. Yuan, C. Tan, W. Wang, S. Huang, and F. Huang, “RRHF: Rank responses to align language models with human feedback,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 482.
    [28]
    Y. Zhao, R. Joshi, T. Liu, M. Khalman, M. Saleh, and P. J. Liu, “SLiC-HF: Sequence likelihood calibration with human feedback,” arXiv preprint arXiv: 2305.10425, 2023.
    [29]
    J. Hong, N. Lee, and J. Thorne, “ORPO: Monolithic preference optimization without reference model,” in Proc. Conf. Empirical Methods in Natural Language Processing, Miami, USA, 2024, pp. 11170−11189.
    [30]
    Y. Meng, M. Xia, and D. Chen, “SimPO: Simple preference optimization with a reference-free reward,” in Proc. 38th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2024, Art. no. 3946.
    [31]
    Y. Xie, K. Kawaguchi, Y. Zhao, J. X. Zhao, M.-Y. Kan, J. He, and M. Xie, “Self-evaluation guided beam search for reasoning,” in Proc. 37th Int. Conf. Neural Information Processing System, New Orleans, USA, 2023, Art. no. 1802.
    [32]
    J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, et al., “Chain-of-thought prompting elicits reasoning in large language models,” in Proc. 36th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2022, Art. no. 1800.
    [33]
    X. Wang, J. Wei, D. Schuurmans, Q. V. Le, H. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” in Proc. Eleventh Int. Conf. Learning Representations, Kigali, Rwanda, 2023.
    [34]
    A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, et al., “SELF-REFINE: Iterative refinement with self-feedback,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 2019.
    [35]
    S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,” in Proc. 37th Int. Conf. Neural Information Processing System, New Orleans, USA, 2023, Art. no. 517.
    [36]
    R. Coulom, “Efficient selectivity and backup operators in Monte-Carlo tree search,” in Proc. 5th Int. Conf. Computers and Games, Turin, Italy, 2006, pp. 72−83.
    [37]
    D. Zhang, S. Zhoubian, Z. Hu, Y. Yue, Y. Dong, and J. Tang, “ReST-MCTS*: LLM self-training via process reward guided tree search,” in Proc. 38th Int. Conf. Neural Information Processing System, Vancouver, Canada, 2024, Art. no. 2066.
    [38]
    M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, M. Podstawski, L. Gianinazzi, et al., “Graph of thoughts: Solving elaborate problems with large language models,” in Proc. 38th AAAI Conf. Artificial Intelligence, Vancouver, Canada, 2024, Art. no. 1972.
    [39]
    K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, et al., “Training verifiers to solve math word problems,” arXiv preprint arXiv: 2110.14168, 2021.
    [40]
    A. Ni, S. Iyer, D. Radev, V. Stoyanov, W.-T. Yih, S. I. Wang, and X. V. Lin, “LEVER: Learning to verify language-to-code generation with execution,” in Proc. 40th Int. Conf. Machine Learning, Honolulu, USA, 2023, Art. no. 1086.
    [41]
    H. Lightman, V. Kosaraju, Y. Burda, H. Edwards, B. Baker, T. Lee, J. Leike, J. Schulman, I. Sutskever, and K. Cobbe, “Let’s verify step by step,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
    [42]
    L. Zhang, A. Hosseini, H. Bansal, M. Kazemi, A. Kumar, and R. Agarwal, “Generative verifiers: Reward modeling as next-token prediction,” in Proc. Thirteenth Int. Conf. Learning Representations, Singapore, Singapore, 2025.
    [43]
    Y. Weng, M. Zhu, F. Xia, B. Li, S. He, S. Liu, B. Sun, K. Liu, and J. Zhao, “Large language models are better reasoners with self-verification,” in Findings of the Association for Computational Linguistics, Singapore, Singapore, 2023, pp. 2550−2575.
    [44]
    N. Singhi, H. Bansal, A. Hosseini, A. Grover, K.-W. Chang, M. Rohrbach, and A. Rohrbach, “When to solve, when to verify: Compute-optimal problem solving and generative verification for LLM reasoning,” in Proc. Second Conf. Language Modeling, Montreal, Canada, 2025.
    [45]
    T. Schuster, A. Fisch, J. Gupta, M. Dehghani, D. Bahri, V. Q. Tran, Y. Tay, and D. Metzler, “Confident adaptive language modeling,” in Proc. 36th Int. Conf. Neural Information Processing System, New Orleans, USA, 2022, Art. no. 1269.
    [46]
    M. Yue, J. Zhao, M. Zhang, L. Du, and Z. Yao, “Large language model cascades with mixture of thought representations for cost-efficient reasoning,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
    [47]
    X. Ning, Z. Lin, Z. Zhou, Z. Wang, H. Yang, and Y. Wang, “Skeleton-of-thought: Prompting LLMs for efficient parallel generation,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
    [48]
    Z. Yao, R. Y. Aminabadi, O. Ruwase, S. Rajbhandari, X. Wu, A. A. Awan, et al., “DeepSpeed-chat: Easy, fast and affordable RLHF training of ChatGPT-like models at all scales,” arXiv preprint arXiv: 2308.01320, 2023.
    [49]
    L. Von Werra, Y. Belkada, L. Tunstall, E. Beeching, T. Thrush, N. Lambert, S. Huang, K. Rasul, and Q. Gallouedec. TRL: Transformer reinforcement learning. 2020. [Online]. Available: https://github.com/huggingface/trl
    [50]
    Y. Zhao, J. Huang, J. Hu, X. Wang, Y. Mao, D. Zhang, et al., “Swift: A scalable lightweight infrastructure for fine-tuning,” in Proc. 39th AAAI Conf. Artificial Intelligence, Philadelphia, Pennsylvania, 2025, Art. no. 3485.
    [51]
    Z. Zhu, C. Xie, X. Lv, and slime Contributors. (2025). slime: An LLM post-training framework for RL scaling [Online]. Available: https://github.com/THUDM/slime
    [52]
    Z. Wang, K. Wang, Q. Wang, P. Zhang, L. Li, Z. Yang, et al., “RAGEN: Understanding self-evolution in LLM agents via multi-turn reinforcement learning,” arXiv preprint arXiv: 2504.20073, 2025.
    [53]
    Y. Zhong, Z. Zhang, B. Wu, S. Liu, Y. Chen, C. Wan, et al., “Optimizing RLHF training for large language models with stage fusion,” in Proc. 22nd USENIX Symp. on Networked Systems Design and Implementation, Philadelphia, USA, 2025, Art. no. 26.
    [54]
    Z. Wang, T. Zhou, L. Liu, A. Li, J. Hu, D. Yang, et al., “DistFlow: A fully distributed RL framework for scalable and efficient LLM post-training,” arXiv preprint arXiv: 2507.13833, 2025.
    [55]
    J. Hu, X. Wu, W. Shen, J. K. Liu, Z. Zhu, W. Wang, et al., “OpenRLHF: An easy-to-use, scalable and high-performance RLHF framework,” arXiv preprint arXiv: 2405.11143, 2024.
    [56]
    NeMo RL: A scalable and efficient post-training library. (2025) [Online]. Available: https://github.com/NVIDIA-NeMo/RL
    [57]
    S. Cao, S. Hegde, D. Li, T. Griggs, S. Liu, E. Tang, et al., SkyRL-v0: Train real-world long-horizon agents via reinforcement learning. 2025. [Online]. Available: https://github.com/NovaSky-AI/SkyRL
    [58]
    W. Fu, J. Gao, X. Shen, C. Zhu, Z. Mei, C. He, et al., “AReal: A large-scale asynchronous reinforcement learning system for language reasoning,” arXiv preprint arXiv: 2505.24298, 2025.
    [59]
    J. Wang, S. Wang, Y. Cui, X. Chen, C. Wang, X. Zhang, et al., “SeamlessFlow: A trainer agent isolation RL framework achieving bubble-free pipelines via tag scheduling,” arXiv preprint arXiv: 2508.11553, 2025.
    [60]
    Z. Han, A. You, H. Wang, K. Luo, G. Yang, W. Shi, et al., “AsyncFlow: An asynchronous streaming RL framework for efficient LLM post-training,” arXiv preprint arXiv: 2507.01663, 2025.
    [61]
    C. Yu, S. Lu, C. Zhuang, D. Wang, Q. Wu, Z. Li, et al., “AWorld: Orchestrating the training recipe for agentic AI,” arXiv preprint arXiv: 2508.20404, 2025.
    [62]
    W. Brown. (2025). Verifiers: Environments for LLM reinforcement learning [Online]. Available: https://github.com/PrimeIntellect-ai/verifiers
    [63]
    W. Kwon, Z. Li, S. Zhuang, Y. Sheng, L. Zheng, C. H. Yu, J. Gonzalez, H. Zhang, and I. Stoica, “Efficient memory management for large language model serving with PagedAttention,” in Proc. 29th Symp. on Operating Systems Principles, Koblenz, Germany, 2023, pp. 611−626.
    [64]
    C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, and K. R. Narasimhan, “SWE-bench: Can language models resolve real-world GitHub issues?” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
    [65]
    Z. Deng, W. Ma, Q.-L. Han, W. Zhou, X. Zhu, S. Wen, and Y. Xiang, “Exploring DeepSeek: A survey on advances, applications, challenges and future directions,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 5, pp. 872–893, May 2025. doi: 10.1109/JAS.2025.125498
    [66]
    Z. Xi, J. Huang, C. Liao, B. Huang, H. Guo, J. Liu, et al., “AgentGym-RL: Training LLM agents for long-horizon decision making through multi-turn reinforcement learning,” arXiv preprint arXiv: 2509.08755, 2025.
    [67]
    K. Team, A. Du, B. Gao, B. Xing, C. Jiang, C. Chen, et al., “Kimi k1.5: Scaling reinforcement learning with LLMs,” arXiv preprint arXiv: 2501.12599, 2025.
    [68]
    Meituan LongCat Team, B. Wang, Bayan, B. Xiao, B. Zhang, B. Rong, et al., “LongCat-flash-Omni technical report,” arXiv preprint arXiv: 2511.00279, 2025.
    [69]
    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y. Cao, “React: Synergizing reasoning and acting in language models,” in Proc. Eleventh Int. Conf. Learning Representations, Kigali, Rwanda, 2023.
    [70]
    N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language agents with verbal reinforcement learning,” in Proc. 37th Int. Conf. Neural Information Processing System, New Orleans, USA, 2023, Art. no. 377.
    [71]
    A. Asai, Z. Wu, Y. Wang, A. Sil, and H. Hajishirzi, “Self-RAG: Learning to retrieve, generate, and critique through self-reflection,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
    [72]
    Q. Tang, Z. Deng, H. Lin, X. Han, Q. Liang, B. Cao, and L. Sun, “ToolAlpaca: Generalized tool learning for language models with 3000 simulated cases,” arXiv preprint arXiv: 2306.05301, 2023.
    [73]
    Y. Qin, S. Liang, Y. Ye, K. Zhu, L. Yan, Y. Lu, et al., “ToolLLM: Facilitating large language models to master 16000+ real-world APIs,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
    [74]
    S. G. Patil, T. Zhang, X. Wang, and J. E. Gonzalez, “Gorilla: Large language model connected with massive APIs,” in Proc. 38th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2024, Art. no. 4020.
    [75]
    P. Lu, B. Peng, H. Cheng, M. Galley, K.-W. Chang, Y. N. Wu, S.-C. Zhu, and J. Gao, “Chameleon: Plug-and-play compositional reasoning with large language models,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 1882.
    [76]
    S. Gao, J. Dwivedi-Yu, P. Yu, X. E. Tan, R. Pasunuru, O. Golovneva, K. Sinha, A. Celikyilmaz, A. Bosselut, and T. Wang, “Efficient tool use with chain-of-abstraction reasoning,” in Proc. 31st Int. Conf. Computational Linguistics, Abu Dhabi, UAE, 2025, pp. 2727−2743.
    [77]
    S. Chen, Y. Wang, Y.-F. Wu, Q. Chen, Z. Xu, W. Luo, K. Zhang, and L. Zhang, “Advancing tool-augmented large language models: Integrating insights from errors in inference trees,” in Proc. 38th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2024, Art. no. 3382.
    [78]
    Y. Ruan, H. Dong, A. Wang, S. Pitis, Y. Zhou, J. Ba, Y. Dubois, C. J. Maddison, and T. Hashimoto, “Identifying the risks of lm agents with an lm-emulated sandbox,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
    [79]
    S. Yuan, K. Song, J. Chen, X. Tan, Y. Shen, K. Ren, D. Li, and D. Yang, “EASYTOOL: Enhancing LLM-based agents with concise tool instruction,” in Proc. Conf. Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, Albuquerque, New Mexico, 2025, pp. 951−972.
    [80]
    Y. Shen, K. Song, X. Tan, D. Li, W. Lu, and Y. Zhuang, “HuggingGPT: Solving AI tasks with ChatGPT and its friends in Hugging Face,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 1657.
    [81]
    Z. Zhao, W. S. Lee, and D. Hsu, “Large language models as commonsense knowledge for large-scale task planning,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 1387.
    [82]
    A. Zhou, K. Yan, M. Shlapentokh-Rothman, H. Wang, and Y.-X. Wang, “Language agent tree search unifies reasoning, acting, and planning in language models,” in Proc. 41st Int. Conf. Machine Learning, Vienna, Austria, 2024, Art. no. 2572.
    [83]
    B. Liu, Y. Jiang, X. Zhang, Q. Liu, S. Zhang, J. Biswas, and P. Stone, “LLM+P: Empowering large language models with optimal planning proficiency,” arXiv preprint arXiv: 2304.11477, 2023.
    [84]
    L. E. Erdogan, N. Lee, S. Kim, S. Moon, H. Furuta, G. Anumanchipalli, K. Keutzer, and A. Gholami, “Plan-and-Act: Improving planning of agents for long-horizon tasks,” in Proc. Forty-Second Int. Conf. Machine Learning, Vancouver, Canada, 2025.
    [85]
    W. Huang, P. Abbeel, D. Pathak, and I. Mordatch, “Language models as zero-shot planners: Extracting actionable knowledge for embodied agents,” in Proc. 39th Int. Conf. Machine Learning, Baltimore, USA, 2022, pp. 9118−9147.
    [86]
    W. Huang, F. Xia, T. Xiao, H. Chan, J. Liang, P. Florence, et al., “Inner monologue: Embodied reasoning through planning with language models,” in Proc. 6th Conf. Robot Learning, Auckland, New Zealand, 2023, pp. 1769−1782.
    [87]
    C. H. Song, B. M. Sadler, J. Wu, W.-L. Chao, C. Washington, and Y. Su, “LLM-planner: Few-shot grounded planning for embodied agents with large language models,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Paris, France, 2023, pp. 2998−3009.
    [88]
    S. Qiao, R. Fang, N. Zhang, Y. Zhu, X. Chen, S. Deng, Y. Jiang, P. Xie, F. Huang, and H. Chen, “Agent planning with world knowledge model,” in Proc. 38th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2024, Art. no. 3646.
    [89]
    A. Maharana, D.-H. Lee, S. Tulyakov, M. Bansal, F. Barbieri, and Y. Fang, “Evaluating very long-term conversational memory of LLM agents,” in Proc. 62nd Annu. Meet. of the Association for Computational Linguistics, Bangkok, Thailand, 2024, pp. 13851−13870.
    [90]
    W. Zhong, L. Guo, Q. Gao, H. Ye, and Y. Wang, “MemoryBank: Enhancing large language models with long-term memory,” in Proc. 38th AAAI Conf. Artificial Intelligence, Vancouver, Canada, 2024, pp. 19724−19731.
    [91]
    W. Wang, L. Dong, H. Cheng, X. Liu, X. Yan, J. Gao, and F. Wei, “Augmenting language models with long-term memory,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, Art. no. 3259, 2023.
    [92]
    C. Packer, S. Wooders, K. Lin, V. Fang, S. G. Patil, I. Stoica, and J. E. Gonzalez, “MemGPT: Towards LLMs as operating systems,” arXiv preprint arXiv: 2310.08560, 2023.
    [93]
    K.-H. Lee, X. Chen, H. Furuta, J. Canny, and I. Fischer, “A human-inspired reading agent with gist memory of very long contexts,” in Proc. 41st Int. Conf. Machine Learning, Vienna, Austria, 2024, Art. no. 1054.
    [94]
    W. Xu, Z. Liang, K. Mei, H. Gao, J. Tan, and Y. Zhang, “A-MEM: Agentic memory for LLM agents,” in Proc. 39th Conf. Neural Information Processing Systems, San Diego, USA, 2025.
    [95]
    P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, et al., “Retrieval-augmented generation for knowledge-intensive NLP tasks,” in Proc. 34th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, Art. no. 793.
    [96]
    G. Izacard, P. Lewis, M. Lomeli, L. Hosseini, F. Petroni, T. Schick, J. Dwivedi-Yu, A. Joulin, S. Riedel, and E. Grave, “Atlas: Few-shot learning with retrieval augmented language models,” J. Mach. Learn. Res., vol. 24, no. 251, pp. 1–43, Jul. 2023.
    [97]
    B. Peng, Y. Zhu, Y. Liu, X. Bo, H. Shi, C. Hong, Y. Zhang, and S. Tang, “Graph retrieval-augmented generation: A survey,” ACM Trans. Inform. Syst., vol. 44, no. 2, Art. no. 35, Feb. 2026.
    [98]
    D.-Y. Zhou, X. Xue, Q. Ma, C. Guo, L.-Z. Cui, Y.-L. Tian, J. Yang, and F.-Y. Wang, “Federated experiments: Generative causal inference powered by LLM-based agents simulation and RAG-based domain docking,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 7, pp. 1301–1304, Jul. 2025. doi: 10.1109/JAS.2024.124671
    [99]
    H. Zhu, J. Lu, Y. Lou, and X. Yang, “Distributed saturated impulsive quasi-consensus for leader-follower multi-agent systems: An open topology framework,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 9, pp. 1941–1943, Sep. 2025. doi: 10.1109/JAS.2024.124743
    [100]
    Y. Li, Y. Zhang, X. Li, and C. Sun, “Regional multi-agent cooperative reinforcement learning for city-level traffic grid signal control,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 9, pp. 1987–1998, Sep. 2024. doi: 10.1109/JAS.2024.124365
    [101]
    G. Li, H. A. A. K. Hammoud, H. Itani, D. Khizbullin, and B. Ghanem, “CAMEL: Communicative agents for ‘mind’ exploration of large language model society,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 2264, 2023.
    [102]
    W. Chen, Y. Su, J. Zuo, C. Yang, C. Yuan, C.-M. Chan, et al., “AgentVerse: Facilitating multi-agent collaboration and exploring emergent behaviors,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
    [103]
    Z. Wang, S. Mao, W. Wu, T. Ge, F. Wei, and H. Ji, “Unleashing the emergent cognitive synergy in large language models: A task-solving agent through multi-persona self-collaboration,” in Proc. Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Mexico City, Mexico, 2024, pp. 257−279.
    [104]
    S. Hong, M. Zhuge, J. Chen, X. Zheng, Y. Cheng, J. Wang, et al., “MetaGPT: Meta programming for a multi-agent collaborative framework,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
    [105]
    L. H. Klein, N. Potamitis, R. Aydin, R. West, C. Gulcehre, and A. Arora, “Fleet of agents: Coordinated problem solving with large language models,” in Proc. Forty-Second Int. Conf. Machine Learning, Vancouver, Canada, 2025.
    [106]
    C.-M. Chan, W. Chen, Y. Su, J. Yu, W. Xue, S. Zhang, J. Fu, and Z. Liu, “ChatEval: Towards better LLM-based evaluators through multi-agent debate,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2023.
    [107]
    T. Liang, Z. He, W. Jiao, X. Wang, Y. Wang, R. Wang, Y. Yang, S. Shi, and Z. Tu, “Encouraging divergent thinking in large language models through multi-agent debate,” in Proc. Conf. Empirical Methods in Natural Language Processing, Miami, USA, 2024, pp. 17889−17904.
    [108]
    C. Inc. (2025). CrewAI [Online]. Available: https://github.com/crewAIInc/crewAI
    [109]
    D. Chen, H. Zhang, H. Wang, Y. Huo, Y. Li, and J. Wang, “GameGPT: Multi-agent collaborative framework for game development,” arXiv preprint arXiv: 2310.08067, 2023.
    [110]
    Microsoft. (2023). Autogen [Online]. Available: https://github.com/microsoft/autogen. Accessed on: Dec. 1, 2025.
    [111]
    OpenAI. (2024). Swarm [Online]. Available: https://github.com/openai/swarm. Accessed on: Dec. 1, 2025.
    [112]
    L. Foundation. (2025). Agentstack [Online]. Available: https://github.com/i-am-bee/agentstack. Accessed on: Dec. 1, 2025.
    [113]
    Amazon. (2025). Strands agents SDK for Python [Online]. Available: https://github.com/strands-agents/sdk-python. Accessed on: Dec. 1, 2025.
    [114]
    L. AI. (2023). Langchain: The platform for reliable agents [Online]. Available: https://github.com/langchain-ai/langchain. Accessed on: Dec. 1, 2025.
    [115]
    Microsoft. (2025). Semantic kernel [Online]. Available: https://github.com/microsoft/semantic-kernel. Accessed on: Dec. 1, 2025.
    [116]
    LlamaIndex. (2025). LlamaIndex [Online]. Available: https://www.llamaindex.ai/. Accessed on: Dec. 1, 2025.
    [117]
    L. AI. (2024). LangGraph: Build resilient language agents as graphs [Online]. Available: https://github.com/langchain-ai/langgraph. Accessed on: Dec. 1, 2025.
    [118]
    LangFlow AI. (2024). LangFlow [Online]. Available: https://github.com/langflow-ai/langflow. Accessed on: Dec. 1, 2025.
    [119]
    J. Xu, Q. Sun, Q.-L. Han, and Y. Tang, “When embodied AI meets industry 5.0: Human-centered smart manufacturing,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 3, pp. 485–501, Mar. 2025. doi: 10.1109/JAS.2025.125327
    [120]
    T. Shen, J. Sun, S. Kong, Y. Wang, J. Li, X. Li, and F.-Y. Wang, “The journey/DAO/TAO of embodied intelligence: From large models to foundation intelligence and parallel intelligence,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 6, pp. 1313–1316, Jun. 2024. doi: 10.1109/JAS.2024.124407
    [121]
    H. Tan, Y. Feng, X. Mao, S. Huang, G. Liu, Z. Hao, H. Su, and J. Zhu, “AnyPos: Automated task-agnostic actions for bimanual manipulation,” arXiv preprint arXiv: 2507.12768, 2025.
    [122]
    Y. Qin, W. Yang, B. Huang, K. Van Wyk, H. Su, X. Wang, Y.-W. Chao, and D. Fox, “AnyTeleop: A general vision-based dexterous robot arm-hand teleoperation system,” in Proc. Robotics: Science and Systems XIX, Daegu, Republic of Korea, 2023.
    [123]
    Y. Feng, H. Tan, X. Mao, C. Xiang, G. Liu, S. Huang, H. Su, and J. Zhu, “Vidar: Embodied video diffusion model for generalist manipulation,” arXiv preprint arXiv: 2507.12898, 2025.
    [124]
    M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, et al., “OpenVLA: An open-source vision-language-action model,” in Proc. 8th Conf. Robot Learning, Munich, Germany, 2024, pp. 2679−2713.
    [125]
    K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, et al., “ $\pi$_0: A vision-language-action flow model for general robot control,” arXiv preprint arXiv. 2410.24164, 2024.
    [126]
    J. Bjorck, F. Castañeda, N. Cherniadev, X. Da, R. Ding, L. Fan, et al., “GR00T N1: An open foundation model for generalist humanoid robots,” arXiv preprint arXiv: 2503.14734, 2025.
    [127]
    R. Cadene, S. Alibert, A. Soare, Q. Gallouedec, A. Zouitine, and S. Palma, et al., (2024). LeRobot: State-of-the-art machine learning for real-world robotics in Pytorch [Online]. Available: https://github.com/huggingface/lerobot
    [128]
    C. Yu, Y. Wang, Z. Guo, H. Lin, S. Xu, H. Zang, et al., “RLinf: Flexible and efficient large-scale reinforcement learning via macro-to-micro flow transformation,” arXiv preprint arXiv: 2509.15965, 2025.
    [129]
    B. Liu, Y. Zhu, C. Gao, Y. Feng, Q. Liu, Y. Zhu, and P. Stone, “LIBERO: Benchmarking knowledge transfer for lifelong robot learning,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 1939.
    [130]
    A. Allshire, H. Choi, J. Zhang, D. McAllister, A. Zhang, C. M. Kim, T. Darrell, P. Abbeel, J. Malik, and A. Kanazawa, “Visual imitation enables contextual humanoid control,” arXiv preprint arXiv: 2505.03729, 2025.
    [131]
    C. Wang, Z. Wang, W. Sheng, Q. Liu, and H. Dong, “Fuzzy domain adaptation via variational inference for evolving concept drift,” IEEE Trans. Fuzzy Syst., vol. 33, no. 7, pp. 2361–2375, Jul. 2025. doi: 10.1109/TFUZZ.2025.3567089
    [132]
    C. Wang, Z. Wang, Q. Liu, H. Dong, W. Liu, and X. Liu, “A comprehensive survey on domain adaptation for intelligent fault diagnosis,” Knowl.-Based Syst., vol. 37, Art. no. 114109, 2025.
    [133]
    S. Zhou, F. F. Xu, H. Zhu, X. Zhou, R. Lo, A. Sridhar, et al., “WebArena: A realistic web environment for building autonomous agents,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)  / Tables(3)

    Article Metrics

    Article views (21) PDF downloads(1) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return