Volume 13
Issue 4
IEEE/CAA Journal of Automatica Sinica
| Citation: | W. Ma, Y. Guo, Q.-L. Han, W. Zhou, X. Zhu, J. Xiong, S. Wen, and Y. Xiang, IEEE/CAA J. Autom. Sinica, vol. 13, no. 4, pp. 776–795, Apr. 2026. doi: 10.1109/JAS.2026.125993 |
| [1] |
Z. Wang, S. Cai, G. Chen, A. Liu, X. Ma, Y. Liang, and T. CraftJarvis, “Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 1480.
|
| [2] |
X. Li, S. Wang, S. Zeng, Y. Wu, and Y. Yang, “A survey on LLM-based multi-agent systems: Workflow, infrastructure, and challenges,” Vicinagearth, vol. 1, no. 1, Art. no. 9, Oct. 2024. doi: 10.1007/s44336-024-00009-2
|
| [3] |
J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” in Proc. 36th Annu. ACM Symp. on User Interface Software and Technology, San Francisco, USA, 2023, Art. no. 2.
|
| [4] |
X. Zhu, Y. Chen, H. Tian, C. Tao, W. Su, C. Yang, et al., “Ghost in the Minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory,” arXiv preprint arXiv: 2305.17144, 2023.
|
| [5] |
G. Wang, Y. Xie, Y. Jiang, A. Mandlekar, C. Xiao, Y. Zhu, L. Fan, and A. Anandkumar, “Voyager: An open-ended embodied agent with large language models,” Trans. Mach. Learn. Res., Mar. 2024.
|
| [6] |
C. Qu, S. Dai, X. Wei, H. Cai, S. Wang, D. Yin, J. Xu, and J.-R. Wen, “Tool learning with large language models: A survey,” Front. Comput. Sci., vol. 19, no. 8, Art. no. 198343, Jan. 2025. doi: 10.1007/s11704-024-40678-2
|
| [7] |
T. Schick, J. Dwivedi-Yu, R. Dessí, R. Raileanu, M. Lomeli, E. Hambro, L. Zettlemoyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 2997.
|
| [8] |
C. Qian, W. Liu, H. Liu, N. Chen, Y. Dang, J. Li, et al., “ChatDev: Communicative agents for software development,” in Proc. 62nd Annu. Meet. of the Association for Computational Linguistics, Bangkok, Thailand, 2024, pp. 15174−15186.
|
| [9] |
Z. Li, W. Wu, Y. Guo, J. Sun, and Q.-L. Han, “Embodied multi-agent systems: A review,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 6, pp. 1095–1116, Jun. 2025. doi: 10.1109/JAS.2025.125552
|
| [10] |
W. Zhou, X. Zhu, Q.-L. Han, L. Li, X. Chen, S. Wen, and Y. Xiang, “The security of using large language models: A survey with emphasis on ChatGPT,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 1, pp. 1–26, Nov. 2025.
|
| [11] |
J. Lin, X. Zeng, J. Zhu, S. Wang, J. Shun, J. Wu, and D. Zhou, “Plan and budget: Effective and efficient test-time scaling on reasoning large language models,” arXiv preprint arXiv: 2505.16122, 2025.
|
| [12] |
Q. Zhang, F. Lyu, Z. Sun, L. Wang, W. Zhang, W. Hua, et al., “A survey on test-time scaling in large language models: What, how, where, and how well?” arXiv preprint arXiv: 2503.24235, 2025.
|
| [13] |
Y. Zhong, Z. Zhang, X. Song, H. Hu, C. Jin, B. Wu, et al., “StreamRL: Scalable, heterogeneous, and elastic RL for LLMs with disaggregated stream generation,” arXiv preprint arXiv: 2504.15930, 2025.
|
| [14] |
G. Sheng, C. Zhang, Z. Ye, X. Wu, W. Zhang, R. Zhang, Y. Peng, H. Lin, and C. Wu, “HybridFlow: A flexible and efficient RLHF framework,” in Proc. Twentieth European Conf. Computer Systems, Rotterdam, Netherlands, 2025, pp. 1279−1297.
|
| [15] |
G. Sheng, Y. Tong, B. Wan, W. Zhang, C. Jia, X. Wu, et al., “Laminar: A scalable asynchronous RL post-training framework,” arXiv preprint arXiv: 2510.12633, 2025.
|
| [16] |
W. Wang, S. Xiong, G. Chen, W. Gao, S. Guo, Y. He, et al., “Reinforcement learning optimization for large-scale learning: An efficient and user-friendly scaling library,” arXiv preprint arXiv: 2506.06122, 2025.
|
| [17] |
R. Kamoi, Y. Zhang, N. Zhang, J. Han, and R. Zhang, “When can LLMs actually correct their own mistakes? A critical survey of self-correction of LLMs,” Trans. Assoc. Comput. Linguist., vol. 12, pp. 1417–1440, Nov. 2024. doi: 10.1162/tacl_a_00713
|
| [18] |
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv: 1707.06347, 2017.
|
| [19] |
M. Luo, J. Yao, R. Liaw, E. Liang, and I. Stoica, “IMPACT: Importance weighted asynchronous architectures with clipped target networks,” in Proc. 8th Int. Conf. Learning Representations, Addis Ababa, Ethiopia, 2020.
|
| [20] |
Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, et al., “DeepSeekMath: Pushing the limits of mathematical reasoning in open language models,” arXiv preprint arXiv: 2402.03300, 2024.
|
| [21] |
Q. Yu, Z. Zhang, R. Zhu, Y. Yuan, X. Zuo, Y. Yue, et al., “DAPO: An open-source LLM reinforcement learning system at scale,” arXiv preprint arXiv: 2503.14476, 2025.
|
| [22] |
J. Liu, J. Obando-Ceron, H. Lu, Y. He, W. Wang, W. Su, B. Zheng, P. S. Castro, A. Courville, and L. Pan, “Asymmetric proximal policy optimization: Mini-critics boost LLM reasoning,” arXiv preprint arXiv: 2510.01656, 2025.
|
| [23] |
H. Wang, S. Hao, H. Dong, S. Zhang, Y. Bao, Z. Yang, and Y. Wu, “Offline reinforcement learning for LLM multi-step reasoning,” in Proc. Findings of the Association for Computational Linguistics: ACL 2025, Vienna, Austria, 2025, pp. 8881−8893.
|
| [24] |
R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 2338.
|
| [25] |
M. G. Azar, Z. D. Guo, B. Piot, R. Munos, M. Rowland, M. Valko, and D. Calandriello, “A general theoretical paradigm to understand learning from human preferences,” in Proc. 27th Int. Conf. Artificial Intelligence and Statistics, Valencia, Spain, 2024, pp. 4447−4455.
|
| [26] |
K. Ethayarajh, W. Xu, N. Muennighoff, D. Jurafsky, and D. Kiela, “Model alignment as prospect theoretic optimization,” in Proc. 41st Int. Conf. Machine Learning, Vienna, Austria, 2024, pp. 12634−12651.
|
| [27] |
H. Yuan, Z. Yuan, C. Tan, W. Wang, S. Huang, and F. Huang, “RRHF: Rank responses to align language models with human feedback,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 482.
|
| [28] |
Y. Zhao, R. Joshi, T. Liu, M. Khalman, M. Saleh, and P. J. Liu, “SLiC-HF: Sequence likelihood calibration with human feedback,” arXiv preprint arXiv: 2305.10425, 2023.
|
| [29] |
J. Hong, N. Lee, and J. Thorne, “ORPO: Monolithic preference optimization without reference model,” in Proc. Conf. Empirical Methods in Natural Language Processing, Miami, USA, 2024, pp. 11170−11189.
|
| [30] |
Y. Meng, M. Xia, and D. Chen, “SimPO: Simple preference optimization with a reference-free reward,” in Proc. 38th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2024, Art. no. 3946.
|
| [31] |
Y. Xie, K. Kawaguchi, Y. Zhao, J. X. Zhao, M.-Y. Kan, J. He, and M. Xie, “Self-evaluation guided beam search for reasoning,” in Proc. 37th Int. Conf. Neural Information Processing System, New Orleans, USA, 2023, Art. no. 1802.
|
| [32] |
J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, et al., “Chain-of-thought prompting elicits reasoning in large language models,” in Proc. 36th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2022, Art. no. 1800.
|
| [33] |
X. Wang, J. Wei, D. Schuurmans, Q. V. Le, H. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” in Proc. Eleventh Int. Conf. Learning Representations, Kigali, Rwanda, 2023.
|
| [34] |
A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, et al., “SELF-REFINE: Iterative refinement with self-feedback,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 2019.
|
| [35] |
S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,” in Proc. 37th Int. Conf. Neural Information Processing System, New Orleans, USA, 2023, Art. no. 517.
|
| [36] |
R. Coulom, “Efficient selectivity and backup operators in Monte-Carlo tree search,” in Proc. 5th Int. Conf. Computers and Games, Turin, Italy, 2006, pp. 72−83.
|
| [37] |
D. Zhang, S. Zhoubian, Z. Hu, Y. Yue, Y. Dong, and J. Tang, “ReST-MCTS*: LLM self-training via process reward guided tree search,” in Proc. 38th Int. Conf. Neural Information Processing System, Vancouver, Canada, 2024, Art. no. 2066.
|
| [38] |
M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, M. Podstawski, L. Gianinazzi, et al., “Graph of thoughts: Solving elaborate problems with large language models,” in Proc. 38th AAAI Conf. Artificial Intelligence, Vancouver, Canada, 2024, Art. no. 1972.
|
| [39] |
K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, et al., “Training verifiers to solve math word problems,” arXiv preprint arXiv: 2110.14168, 2021.
|
| [40] |
A. Ni, S. Iyer, D. Radev, V. Stoyanov, W.-T. Yih, S. I. Wang, and X. V. Lin, “LEVER: Learning to verify language-to-code generation with execution,” in Proc. 40th Int. Conf. Machine Learning, Honolulu, USA, 2023, Art. no. 1086.
|
| [41] |
H. Lightman, V. Kosaraju, Y. Burda, H. Edwards, B. Baker, T. Lee, J. Leike, J. Schulman, I. Sutskever, and K. Cobbe, “Let’s verify step by step,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
|
| [42] |
L. Zhang, A. Hosseini, H. Bansal, M. Kazemi, A. Kumar, and R. Agarwal, “Generative verifiers: Reward modeling as next-token prediction,” in Proc. Thirteenth Int. Conf. Learning Representations, Singapore, Singapore, 2025.
|
| [43] |
Y. Weng, M. Zhu, F. Xia, B. Li, S. He, S. Liu, B. Sun, K. Liu, and J. Zhao, “Large language models are better reasoners with self-verification,” in Findings of the Association for Computational Linguistics, Singapore, Singapore, 2023, pp. 2550−2575.
|
| [44] |
N. Singhi, H. Bansal, A. Hosseini, A. Grover, K.-W. Chang, M. Rohrbach, and A. Rohrbach, “When to solve, when to verify: Compute-optimal problem solving and generative verification for LLM reasoning,” in Proc. Second Conf. Language Modeling, Montreal, Canada, 2025.
|
| [45] |
T. Schuster, A. Fisch, J. Gupta, M. Dehghani, D. Bahri, V. Q. Tran, Y. Tay, and D. Metzler, “Confident adaptive language modeling,” in Proc. 36th Int. Conf. Neural Information Processing System, New Orleans, USA, 2022, Art. no. 1269.
|
| [46] |
M. Yue, J. Zhao, M. Zhang, L. Du, and Z. Yao, “Large language model cascades with mixture of thought representations for cost-efficient reasoning,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
|
| [47] |
X. Ning, Z. Lin, Z. Zhou, Z. Wang, H. Yang, and Y. Wang, “Skeleton-of-thought: Prompting LLMs for efficient parallel generation,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
|
| [48] |
Z. Yao, R. Y. Aminabadi, O. Ruwase, S. Rajbhandari, X. Wu, A. A. Awan, et al., “DeepSpeed-chat: Easy, fast and affordable RLHF training of ChatGPT-like models at all scales,” arXiv preprint arXiv: 2308.01320, 2023.
|
| [49] |
L. Von Werra, Y. Belkada, L. Tunstall, E. Beeching, T. Thrush, N. Lambert, S. Huang, K. Rasul, and Q. Gallouedec. TRL: Transformer reinforcement learning. 2020. [Online]. Available: https://github.com/huggingface/trl
|
| [50] |
Y. Zhao, J. Huang, J. Hu, X. Wang, Y. Mao, D. Zhang, et al., “Swift: A scalable lightweight infrastructure for fine-tuning,” in Proc. 39th AAAI Conf. Artificial Intelligence, Philadelphia, Pennsylvania, 2025, Art. no. 3485.
|
| [51] |
Z. Zhu, C. Xie, X. Lv, and slime Contributors. (2025). slime: An LLM post-training framework for RL scaling [Online]. Available: https://github.com/THUDM/slime
|
| [52] |
Z. Wang, K. Wang, Q. Wang, P. Zhang, L. Li, Z. Yang, et al., “RAGEN: Understanding self-evolution in LLM agents via multi-turn reinforcement learning,” arXiv preprint arXiv: 2504.20073, 2025.
|
| [53] |
Y. Zhong, Z. Zhang, B. Wu, S. Liu, Y. Chen, C. Wan, et al., “Optimizing RLHF training for large language models with stage fusion,” in Proc. 22nd USENIX Symp. on Networked Systems Design and Implementation, Philadelphia, USA, 2025, Art. no. 26.
|
| [54] |
Z. Wang, T. Zhou, L. Liu, A. Li, J. Hu, D. Yang, et al., “DistFlow: A fully distributed RL framework for scalable and efficient LLM post-training,” arXiv preprint arXiv: 2507.13833, 2025.
|
| [55] |
J. Hu, X. Wu, W. Shen, J. K. Liu, Z. Zhu, W. Wang, et al., “OpenRLHF: An easy-to-use, scalable and high-performance RLHF framework,” arXiv preprint arXiv: 2405.11143, 2024.
|
| [56] |
NeMo RL: A scalable and efficient post-training library. (2025) [Online]. Available: https://github.com/NVIDIA-NeMo/RL
|
| [57] |
S. Cao, S. Hegde, D. Li, T. Griggs, S. Liu, E. Tang, et al., SkyRL-v0: Train real-world long-horizon agents via reinforcement learning. 2025. [Online]. Available: https://github.com/NovaSky-AI/SkyRL
|
| [58] |
W. Fu, J. Gao, X. Shen, C. Zhu, Z. Mei, C. He, et al., “AReal: A large-scale asynchronous reinforcement learning system for language reasoning,” arXiv preprint arXiv: 2505.24298, 2025.
|
| [59] |
J. Wang, S. Wang, Y. Cui, X. Chen, C. Wang, X. Zhang, et al., “SeamlessFlow: A trainer agent isolation RL framework achieving bubble-free pipelines via tag scheduling,” arXiv preprint arXiv: 2508.11553, 2025.
|
| [60] |
Z. Han, A. You, H. Wang, K. Luo, G. Yang, W. Shi, et al., “AsyncFlow: An asynchronous streaming RL framework for efficient LLM post-training,” arXiv preprint arXiv: 2507.01663, 2025.
|
| [61] |
C. Yu, S. Lu, C. Zhuang, D. Wang, Q. Wu, Z. Li, et al., “AWorld: Orchestrating the training recipe for agentic AI,” arXiv preprint arXiv: 2508.20404, 2025.
|
| [62] |
W. Brown. (2025). Verifiers: Environments for LLM reinforcement learning [Online]. Available: https://github.com/PrimeIntellect-ai/verifiers
|
| [63] |
W. Kwon, Z. Li, S. Zhuang, Y. Sheng, L. Zheng, C. H. Yu, J. Gonzalez, H. Zhang, and I. Stoica, “Efficient memory management for large language model serving with PagedAttention,” in Proc. 29th Symp. on Operating Systems Principles, Koblenz, Germany, 2023, pp. 611−626.
|
| [64] |
C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, and K. R. Narasimhan, “SWE-bench: Can language models resolve real-world GitHub issues?” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
|
| [65] |
Z. Deng, W. Ma, Q.-L. Han, W. Zhou, X. Zhu, S. Wen, and Y. Xiang, “Exploring DeepSeek: A survey on advances, applications, challenges and future directions,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 5, pp. 872–893, May 2025. doi: 10.1109/JAS.2025.125498
|
| [66] |
Z. Xi, J. Huang, C. Liao, B. Huang, H. Guo, J. Liu, et al., “AgentGym-RL: Training LLM agents for long-horizon decision making through multi-turn reinforcement learning,” arXiv preprint arXiv: 2509.08755, 2025.
|
| [67] |
K. Team, A. Du, B. Gao, B. Xing, C. Jiang, C. Chen, et al., “Kimi k1.5: Scaling reinforcement learning with LLMs,” arXiv preprint arXiv: 2501.12599, 2025.
|
| [68] |
Meituan LongCat Team, B. Wang, Bayan, B. Xiao, B. Zhang, B. Rong, et al., “LongCat-flash-Omni technical report,” arXiv preprint arXiv: 2511.00279, 2025.
|
| [69] |
S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y. Cao, “React: Synergizing reasoning and acting in language models,” in Proc. Eleventh Int. Conf. Learning Representations, Kigali, Rwanda, 2023.
|
| [70] |
N. Shinn, F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language agents with verbal reinforcement learning,” in Proc. 37th Int. Conf. Neural Information Processing System, New Orleans, USA, 2023, Art. no. 377.
|
| [71] |
A. Asai, Z. Wu, Y. Wang, A. Sil, and H. Hajishirzi, “Self-RAG: Learning to retrieve, generate, and critique through self-reflection,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
|
| [72] |
Q. Tang, Z. Deng, H. Lin, X. Han, Q. Liang, B. Cao, and L. Sun, “ToolAlpaca: Generalized tool learning for language models with 3000 simulated cases,” arXiv preprint arXiv: 2306.05301, 2023.
|
| [73] |
Y. Qin, S. Liang, Y. Ye, K. Zhu, L. Yan, Y. Lu, et al., “ToolLLM: Facilitating large language models to master 16000+ real-world APIs,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
|
| [74] |
S. G. Patil, T. Zhang, X. Wang, and J. E. Gonzalez, “Gorilla: Large language model connected with massive APIs,” in Proc. 38th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2024, Art. no. 4020.
|
| [75] |
P. Lu, B. Peng, H. Cheng, M. Galley, K.-W. Chang, Y. N. Wu, S.-C. Zhu, and J. Gao, “Chameleon: Plug-and-play compositional reasoning with large language models,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 1882.
|
| [76] |
S. Gao, J. Dwivedi-Yu, P. Yu, X. E. Tan, R. Pasunuru, O. Golovneva, K. Sinha, A. Celikyilmaz, A. Bosselut, and T. Wang, “Efficient tool use with chain-of-abstraction reasoning,” in Proc. 31st Int. Conf. Computational Linguistics, Abu Dhabi, UAE, 2025, pp. 2727−2743.
|
| [77] |
S. Chen, Y. Wang, Y.-F. Wu, Q. Chen, Z. Xu, W. Luo, K. Zhang, and L. Zhang, “Advancing tool-augmented large language models: Integrating insights from errors in inference trees,” in Proc. 38th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2024, Art. no. 3382.
|
| [78] |
Y. Ruan, H. Dong, A. Wang, S. Pitis, Y. Zhou, J. Ba, Y. Dubois, C. J. Maddison, and T. Hashimoto, “Identifying the risks of lm agents with an lm-emulated sandbox,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
|
| [79] |
S. Yuan, K. Song, J. Chen, X. Tan, Y. Shen, K. Ren, D. Li, and D. Yang, “EASYTOOL: Enhancing LLM-based agents with concise tool instruction,” in Proc. Conf. Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, Albuquerque, New Mexico, 2025, pp. 951−972.
|
| [80] |
Y. Shen, K. Song, X. Tan, D. Li, W. Lu, and Y. Zhuang, “HuggingGPT: Solving AI tasks with ChatGPT and its friends in Hugging Face,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 1657.
|
| [81] |
Z. Zhao, W. S. Lee, and D. Hsu, “Large language models as commonsense knowledge for large-scale task planning,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 1387.
|
| [82] |
A. Zhou, K. Yan, M. Shlapentokh-Rothman, H. Wang, and Y.-X. Wang, “Language agent tree search unifies reasoning, acting, and planning in language models,” in Proc. 41st Int. Conf. Machine Learning, Vienna, Austria, 2024, Art. no. 2572.
|
| [83] |
B. Liu, Y. Jiang, X. Zhang, Q. Liu, S. Zhang, J. Biswas, and P. Stone, “LLM+P: Empowering large language models with optimal planning proficiency,” arXiv preprint arXiv: 2304.11477, 2023.
|
| [84] |
L. E. Erdogan, N. Lee, S. Kim, S. Moon, H. Furuta, G. Anumanchipalli, K. Keutzer, and A. Gholami, “Plan-and-Act: Improving planning of agents for long-horizon tasks,” in Proc. Forty-Second Int. Conf. Machine Learning, Vancouver, Canada, 2025.
|
| [85] |
W. Huang, P. Abbeel, D. Pathak, and I. Mordatch, “Language models as zero-shot planners: Extracting actionable knowledge for embodied agents,” in Proc. 39th Int. Conf. Machine Learning, Baltimore, USA, 2022, pp. 9118−9147.
|
| [86] |
W. Huang, F. Xia, T. Xiao, H. Chan, J. Liang, P. Florence, et al., “Inner monologue: Embodied reasoning through planning with language models,” in Proc. 6th Conf. Robot Learning, Auckland, New Zealand, 2023, pp. 1769−1782.
|
| [87] |
C. H. Song, B. M. Sadler, J. Wu, W.-L. Chao, C. Washington, and Y. Su, “LLM-planner: Few-shot grounded planning for embodied agents with large language models,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Paris, France, 2023, pp. 2998−3009.
|
| [88] |
S. Qiao, R. Fang, N. Zhang, Y. Zhu, X. Chen, S. Deng, Y. Jiang, P. Xie, F. Huang, and H. Chen, “Agent planning with world knowledge model,” in Proc. 38th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2024, Art. no. 3646.
|
| [89] |
A. Maharana, D.-H. Lee, S. Tulyakov, M. Bansal, F. Barbieri, and Y. Fang, “Evaluating very long-term conversational memory of LLM agents,” in Proc. 62nd Annu. Meet. of the Association for Computational Linguistics, Bangkok, Thailand, 2024, pp. 13851−13870.
|
| [90] |
W. Zhong, L. Guo, Q. Gao, H. Ye, and Y. Wang, “MemoryBank: Enhancing large language models with long-term memory,” in Proc. 38th AAAI Conf. Artificial Intelligence, Vancouver, Canada, 2024, pp. 19724−19731.
|
| [91] |
W. Wang, L. Dong, H. Cheng, X. Liu, X. Yan, J. Gao, and F. Wei, “Augmenting language models with long-term memory,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, Art. no. 3259, 2023.
|
| [92] |
C. Packer, S. Wooders, K. Lin, V. Fang, S. G. Patil, I. Stoica, and J. E. Gonzalez, “MemGPT: Towards LLMs as operating systems,” arXiv preprint arXiv: 2310.08560, 2023.
|
| [93] |
K.-H. Lee, X. Chen, H. Furuta, J. Canny, and I. Fischer, “A human-inspired reading agent with gist memory of very long contexts,” in Proc. 41st Int. Conf. Machine Learning, Vienna, Austria, 2024, Art. no. 1054.
|
| [94] |
W. Xu, Z. Liang, K. Mei, H. Gao, J. Tan, and Y. Zhang, “A-MEM: Agentic memory for LLM agents,” in Proc. 39th Conf. Neural Information Processing Systems, San Diego, USA, 2025.
|
| [95] |
P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, et al., “Retrieval-augmented generation for knowledge-intensive NLP tasks,” in Proc. 34th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, Art. no. 793.
|
| [96] |
G. Izacard, P. Lewis, M. Lomeli, L. Hosseini, F. Petroni, T. Schick, J. Dwivedi-Yu, A. Joulin, S. Riedel, and E. Grave, “Atlas: Few-shot learning with retrieval augmented language models,” J. Mach. Learn. Res., vol. 24, no. 251, pp. 1–43, Jul. 2023.
|
| [97] |
B. Peng, Y. Zhu, Y. Liu, X. Bo, H. Shi, C. Hong, Y. Zhang, and S. Tang, “Graph retrieval-augmented generation: A survey,” ACM Trans. Inform. Syst., vol. 44, no. 2, Art. no. 35, Feb. 2026.
|
| [98] |
D.-Y. Zhou, X. Xue, Q. Ma, C. Guo, L.-Z. Cui, Y.-L. Tian, J. Yang, and F.-Y. Wang, “Federated experiments: Generative causal inference powered by LLM-based agents simulation and RAG-based domain docking,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 7, pp. 1301–1304, Jul. 2025. doi: 10.1109/JAS.2024.124671
|
| [99] |
H. Zhu, J. Lu, Y. Lou, and X. Yang, “Distributed saturated impulsive quasi-consensus for leader-follower multi-agent systems: An open topology framework,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 9, pp. 1941–1943, Sep. 2025. doi: 10.1109/JAS.2024.124743
|
| [100] |
Y. Li, Y. Zhang, X. Li, and C. Sun, “Regional multi-agent cooperative reinforcement learning for city-level traffic grid signal control,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 9, pp. 1987–1998, Sep. 2024. doi: 10.1109/JAS.2024.124365
|
| [101] |
G. Li, H. A. A. K. Hammoud, H. Itani, D. Khizbullin, and B. Ghanem, “CAMEL: Communicative agents for ‘mind’ exploration of large language model society,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 2264, 2023.
|
| [102] |
W. Chen, Y. Su, J. Zuo, C. Yang, C. Yuan, C.-M. Chan, et al., “AgentVerse: Facilitating multi-agent collaboration and exploring emergent behaviors,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
|
| [103] |
Z. Wang, S. Mao, W. Wu, T. Ge, F. Wei, and H. Ji, “Unleashing the emergent cognitive synergy in large language models: A task-solving agent through multi-persona self-collaboration,” in Proc. Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Mexico City, Mexico, 2024, pp. 257−279.
|
| [104] |
S. Hong, M. Zhuge, J. Chen, X. Zheng, Y. Cheng, J. Wang, et al., “MetaGPT: Meta programming for a multi-agent collaborative framework,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
|
| [105] |
L. H. Klein, N. Potamitis, R. Aydin, R. West, C. Gulcehre, and A. Arora, “Fleet of agents: Coordinated problem solving with large language models,” in Proc. Forty-Second Int. Conf. Machine Learning, Vancouver, Canada, 2025.
|
| [106] |
C.-M. Chan, W. Chen, Y. Su, J. Yu, W. Xue, S. Zhang, J. Fu, and Z. Liu, “ChatEval: Towards better LLM-based evaluators through multi-agent debate,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2023.
|
| [107] |
T. Liang, Z. He, W. Jiao, X. Wang, Y. Wang, R. Wang, Y. Yang, S. Shi, and Z. Tu, “Encouraging divergent thinking in large language models through multi-agent debate,” in Proc. Conf. Empirical Methods in Natural Language Processing, Miami, USA, 2024, pp. 17889−17904.
|
| [108] |
C. Inc. (2025). CrewAI [Online]. Available: https://github.com/crewAIInc/crewAI
|
| [109] |
D. Chen, H. Zhang, H. Wang, Y. Huo, Y. Li, and J. Wang, “GameGPT: Multi-agent collaborative framework for game development,” arXiv preprint arXiv: 2310.08067, 2023.
|
| [110] |
Microsoft. (2023). Autogen [Online]. Available: https://github.com/microsoft/autogen. Accessed on: Dec. 1, 2025.
|
| [111] |
OpenAI. (2024). Swarm [Online]. Available: https://github.com/openai/swarm. Accessed on: Dec. 1, 2025.
|
| [112] |
L. Foundation. (2025). Agentstack [Online]. Available: https://github.com/i-am-bee/agentstack. Accessed on: Dec. 1, 2025.
|
| [113] |
Amazon. (2025). Strands agents SDK for Python [Online]. Available: https://github.com/strands-agents/sdk-python. Accessed on: Dec. 1, 2025.
|
| [114] |
L. AI. (2023). Langchain: The platform for reliable agents [Online]. Available: https://github.com/langchain-ai/langchain. Accessed on: Dec. 1, 2025.
|
| [115] |
Microsoft. (2025). Semantic kernel [Online]. Available: https://github.com/microsoft/semantic-kernel. Accessed on: Dec. 1, 2025.
|
| [116] |
LlamaIndex. (2025). LlamaIndex [Online]. Available: https://www.llamaindex.ai/. Accessed on: Dec. 1, 2025.
|
| [117] |
L. AI. (2024). LangGraph: Build resilient language agents as graphs [Online]. Available: https://github.com/langchain-ai/langgraph. Accessed on: Dec. 1, 2025.
|
| [118] |
LangFlow AI. (2024). LangFlow [Online]. Available: https://github.com/langflow-ai/langflow. Accessed on: Dec. 1, 2025.
|
| [119] |
J. Xu, Q. Sun, Q.-L. Han, and Y. Tang, “When embodied AI meets industry 5.0: Human-centered smart manufacturing,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 3, pp. 485–501, Mar. 2025. doi: 10.1109/JAS.2025.125327
|
| [120] |
T. Shen, J. Sun, S. Kong, Y. Wang, J. Li, X. Li, and F.-Y. Wang, “The journey/DAO/TAO of embodied intelligence: From large models to foundation intelligence and parallel intelligence,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 6, pp. 1313–1316, Jun. 2024. doi: 10.1109/JAS.2024.124407
|
| [121] |
H. Tan, Y. Feng, X. Mao, S. Huang, G. Liu, Z. Hao, H. Su, and J. Zhu, “AnyPos: Automated task-agnostic actions for bimanual manipulation,” arXiv preprint arXiv: 2507.12768, 2025.
|
| [122] |
Y. Qin, W. Yang, B. Huang, K. Van Wyk, H. Su, X. Wang, Y.-W. Chao, and D. Fox, “AnyTeleop: A general vision-based dexterous robot arm-hand teleoperation system,” in Proc. Robotics: Science and Systems XIX, Daegu, Republic of Korea, 2023.
|
| [123] |
Y. Feng, H. Tan, X. Mao, C. Xiang, G. Liu, S. Huang, H. Su, and J. Zhu, “Vidar: Embodied video diffusion model for generalist manipulation,” arXiv preprint arXiv: 2507.12898, 2025.
|
| [124] |
M. J. Kim, K. Pertsch, S. Karamcheti, T. Xiao, A. Balakrishna, S. Nair, et al., “OpenVLA: An open-source vision-language-action model,” in Proc. 8th Conf. Robot Learning, Munich, Germany, 2024, pp. 2679−2713.
|
| [125] |
K. Black, N. Brown, D. Driess, A. Esmail, M. Equi, C. Finn, et al., “ $\pi$_0: A vision-language-action flow model for general robot control,” arXiv preprint arXiv. 2410.24164, 2024.
|
| [126] |
J. Bjorck, F. Castañeda, N. Cherniadev, X. Da, R. Ding, L. Fan, et al., “GR00T N1: An open foundation model for generalist humanoid robots,” arXiv preprint arXiv: 2503.14734, 2025.
|
| [127] |
R. Cadene, S. Alibert, A. Soare, Q. Gallouedec, A. Zouitine, and S. Palma, et al., (2024). LeRobot: State-of-the-art machine learning for real-world robotics in Pytorch [Online]. Available: https://github.com/huggingface/lerobot
|
| [128] |
C. Yu, Y. Wang, Z. Guo, H. Lin, S. Xu, H. Zang, et al., “RLinf: Flexible and efficient large-scale reinforcement learning via macro-to-micro flow transformation,” arXiv preprint arXiv: 2509.15965, 2025.
|
| [129] |
B. Liu, Y. Zhu, C. Gao, Y. Feng, Q. Liu, Y. Zhu, and P. Stone, “LIBERO: Benchmarking knowledge transfer for lifelong robot learning,” in Proc. 37th Int. Conf. Neural Information Processing Systems, New Orleans, USA, 2023, Art. no. 1939.
|
| [130] |
A. Allshire, H. Choi, J. Zhang, D. McAllister, A. Zhang, C. M. Kim, T. Darrell, P. Abbeel, J. Malik, and A. Kanazawa, “Visual imitation enables contextual humanoid control,” arXiv preprint arXiv: 2505.03729, 2025.
|
| [131] |
C. Wang, Z. Wang, W. Sheng, Q. Liu, and H. Dong, “Fuzzy domain adaptation via variational inference for evolving concept drift,” IEEE Trans. Fuzzy Syst., vol. 33, no. 7, pp. 2361–2375, Jul. 2025. doi: 10.1109/TFUZZ.2025.3567089
|
| [132] |
C. Wang, Z. Wang, Q. Liu, H. Dong, W. Liu, and X. Liu, “A comprehensive survey on domain adaptation for intelligent fault diagnosis,” Knowl.-Based Syst., vol. 37, Art. no. 114109, 2025.
|
| [133] |
S. Zhou, F. F. Xu, H. Zhu, X. Zhou, R. Lo, A. Sridhar, et al., “WebArena: A realistic web environment for building autonomous agents,” in Proc. Twelfth Int. Conf. Learning Representations, Vienna, Austria, 2024.
|