Deep Reinforcement Learning Based on Search Space Independent Operators for Black-Box Continuous Optimization

Ye Tian; Yisai Liu; Shangshang Yang; Xingyi Zhang

doi:10.1109/JAS.2025.125444

Volume 13 Issue 4

Apr. 2026

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2026 > 13(4): 913-925

Y. Tian, Y. Liu, S. Yang, and X. Zhang, “Deep reinforcement learning based on search space independent operators for black-box continuous optimization,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 4, pp. 913–925, Apr. 2026. doi: 10.1109/JAS.2025.125444

Citation:

Y. Tian, Y. Liu, S. Yang, and X. Zhang, “Deep reinforcement learning based on search space independent operators for black-box continuous optimization,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 4, pp. 913–925, Apr. 2026. doi: 10.1109/JAS.2025.125444

Citation:

PDF( 1014 KB)

Deep Reinforcement Learning Based on Search Space Independent Operators for Black-Box Continuous Optimization

doi: 10.1109/JAS.2025.125444

Funds: This work was supported in part by the National Natural Science Foundation of China (62136008, 62276001, U21A20512, W2441019), the Anhui Provincial Natural Science Foundation (2308085J03), and the Excellent Youth Foundation of Anhui Provincial Colleges (2022AH030013)

More Information

Author Bio:
Ye Tian received the B.Sc., M.Sc., and Ph.D. degrees from Anhui University in 2012, 2015, and 2018, respectively.He is currently a Professor with the School of Computer Science and Technology, Anhui University. His current research interests include evolutionary computation and its applications. He is the recipient of 2018, 2021, and 2024 IEEE Transactions on Evolutionary Computation Outstanding Paper Award, the 2020 IEEE Computational Intelligence Magazine Outstanding Paper Award, and the 2022 IEEE Computational Intelligence Society Outstanding Ph.D. Dissertation Award

Yisai Liu received the B.Sc. degree from North China Institute of Aerospace Engineering in 2022, where she is currently a master student at the Institutes of Physical Science and Information Technology, Anhui University.Her current research interests include evolutionary algorithm and reinforcement learning

Shangshang Yang received the B.Sc. and Ph.D. degrees from Anhui University in 2017 and 2022, respectively. He was a visiting Ph.D. student at Bielefeld University, Germany, in 2022.He is currently a Postdoctor with the School of Artificial Intelligence, Anhui University. His current research interests include evolutionary multi-objective optimization, neural architecture search, intelligent education, and graph learning. He is the recipient of the 2023 International Conference on Data-driven Optimization of Complex Systems Best Paper Award. He got the Postdoctoral Fellowship Program (Grade B) of China Postdoctoral Science Foundation

Xingyi Zhang (Fellow, IEEE) received the B.Sc. degree from Fuyang Normal College in 2003, and the M.Sc. and Ph.D. degrees from Huazhong University of Science and Technology in 2006 and 2009, respectively.He is currently a Professor with the School of Computer Science and Technology, Anhui University. His current research interests include unconventional models and algorithms of computation, evolutionary multi-objective optimization, and logistic scheduling. He is the recipient of the 2018, 2021, and 2024 IEEE Transactions on Evolutionary Computation Outstanding Paper Award and the 2020 IEEE Computational Intelligence Magazine Outstanding Paper Award
Corresponding author: Xingyi Zhang, e-mail: 09033@ahu.deu.cn
Received Date: 2025-02-18
Accepted Date: 2025-03-30

Abstract

Abstract

Deep reinforcement learning (DRL) has demonstrated exceptional capabilities in combinatorial optimization, which automatically devises policies for solution construction and optimizer refinement. DRL is particularly adept in generating training samples by itself, thereby providing the flexibility to solve a variety of combinatorial optimization problems without supervision. While DRL takes actions according to states extracted from problem-specific information, it cannot be directly applied to black-box continuous optimization lacking explicit information. To address this issue, this paper proposes a search space independent operator based DRL method for black-box continuous optimization. It conceptualizes the optimization process driven by search space independent operators as a Markov decision process, wherein actions are defined as operators and states are extracted from solutions generated by operators. In contrast to other DRL-assisted metaheuristics, the proposed method does not rely on any existing metaheuristic. Instead, it innovates by creating totally new operators, able to surpass the performance boundaries of existing metaheuristics. Compared with state-of-the-art metaheuristics and DRL methods, the proposed method shows significantly faster convergence speed on challenging continuous optimization problems.
- Black-box optimization,
- continuous optimization,
- metaheuristic,
- reinforcement learning,
- search operator

FullText(HTML)

References(65)

References

[1]	Y. Tian, Y. Feng, C. Wang, R. Cao, X. Zhang, X. Pei, K. C. Tan, and Y. Jin, “A large-scale combinatorial many-objective evolutionary algorithm for intensity-modulated radiotherapy planning,” IEEE Trans. Evolutionary Computation, vol. 26, no. 6, pp. 1511–1525, 2022. doi: 10.1109/TEVC.2022.3144675
[2]	L. M. Ochoa-Estopier and M. Jobson, “Optimization of heat-integrated crude oil distillation systems. part ⅰ: The distillation model,” Industrial and Engineering Chemistry Research, vol. 54, no. 18, pp. 4988–5000, 2015.
[3]	C. He, R. Cheng, C. Zhang, Y. Tian, Q. Chen, and X. Yao, “Evolutionary large-scale multiobjective optimization for ratio error estimation of voltage transformers,” IEEE Trans. Evolutionary Computation, vol. 24, no. 5, pp. 868–881, 2020. doi: 10.1109/TEVC.2020.2967501
[4]	K. G. Murty, Linear Programming. Hoboken, USA: John Wiley & Sons, 1983.
[5]	M. Li, “Generalized Lagrange multiplier method and KKT conditions with an application to distributed optimization,” IEEE Trans. Circuits and Systems II: Express Briefs, vol. 66, no. 2, pp. 252–256, 2019. doi: 10.1109/TCSII.2018.2842085
[6]	S. S. Petrova and A. D. Solov’ev, “The origin of the method of steepest descent,” Historia Mathematica, vol. 24, no. 4, pp. 361–375, 1997.
[7]	J. Mockus, Bayesian Approach to Global Optimization: Theory and Applications. Kluwer Academic, Dordrecht, the Netherlands: Springer, 1989.
[8]	Y. Tian, H. Chen, X. Xiang, H. Jiang, and X. Zhang, “A comparative study on evolutionary algorithms and mathematical programming methods for continuous optimization,” in Proc. the IEEE Congress on Evolutionary Computation, 2022.
[9]	D. H. Wolpert and W. G. Macready, “No free lunch theorems for optimization,” IEEE Trans. Evolutionary Computation, vol. 1, no. 1, pp. 67–82, 1997. doi: 10.1109/4235.585893
[10]	Y. Tian, S. Peng, X. Zhang, T. Rodemann, K. C. Tan, and Y. Jin, “A recommender system for metaheuristic algorithms for continuous optimization based on deep recurrent neural networks,” IEEE Trans. Artificial Intelligence, vol. 1, no. 1, pp. 5–18, 2020.
[11]	J. H. Holland, Adaptation in Natural and Artificial Systems. Cambridge, MA: MIT Press, 1992.
[12]	R. Storn and K. Price, “Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces,” J. Global Optimization, vol. 11, no. 4, pp. 341−359, 1997. doi: 10.1023/A:1008202821328
[13]	N. Hansen and A. Ostermeier, “Completely derandomized self-adaptation in evolution strategies,” Evolutionary Computation, vol. 9, no. 2, pp. 159–195, 2001. doi: 10.1162/106365601750190398
[14]	Y. Tian, X. Li, H. Ma, X. Zhang, K. C. Tan, and Y. Jin, “Deep reinforcement learning based adaptive operator selection for evolutionary multi-objective optimization,” IEEE Trans. Emerging Topics in Computational Intelligence, vol. 7, no. 4, pp. 1051–1064, 2023.
[15]	H. Tong, S. Zhang, C. Huang, and X. Yao, “Algorithm portfolio for parameter tuned evolutionary algorithms,” in Proc. the IEEE Symposium Series on Computational Intelligence, 2019, pp. 1849–1856.
[16]	N. Mazyavkina, S. Sviridov, S. Ivanov, and E. Burnaev, “Reinforcement learning for combinatorial optimization: A survey,” Computers and Operations Research, vol. 134, Art. no. 105400, 2021.
[17]	L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. Artificial Intelligence Research, no. 4, pp. 237–285, 1996.
[18]	H. A. Nomer, A. W. Mohamed, and A. H. Yousef, “GSK-RL: Adaptive gaining-sharing knowledge algorithm using reinforcement learning,” in Proc. the 3rd Novel Intelligent and Leading Emerging Sciences Conf., Giza, Egypt, 2021, pp. 169–174.
[19]	Y. Tian, X. Zhang, C. He, K. C. Tan, and Y. Jin, “Principled design of translation, scale, and rotation invariant variation operators for metaheuristics,” Chinese J. Electronics, vol. 32, no. 1, pp. 111–129, 2023.
[20]	N. Agatz, P. Bouman, and M. Schmidt, “Optimization approaches for the traveling salesman problem with drone,” Transportation Science, vol 52, no. 4, pp. 965−981, 2018.
[21]	G. Xia, Z. Tang, J. Wang, R. Wang, Y. Li, and G. Xia, “A new parallel improvement algorithm for maximum cut problem,” in Advances in Neural Networks – ISNN 2004, vol. 3173, 2004, pp. 419–424.
[22]	I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio, “Neural combinatorial optimization with reinforcement learning,” arXiv preprint arXiv: 1611.09940, 2016.
[23]	M. Deudon, P. Cournut, A. Lacoste, Y. Adulyasak, and L.-M. Rousseau, “Learning heuristics for the TSP by policy gradient,” in Integration of Constraint Programming, Artificial Intelligence, and Operations Research. Springer, 2018, pp. 170–181.
[24]	W. Kool, H. Van Hoof, and M. Welling, “Attention, learn to solve routing problems!” arXiv preprint arXiv: 1803.08475, 2018.
[25]	Q. Ma, S. Ge, D. He, D. Thaker, and I. Drori, “Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning,” arXiv preprint arXiv: 1911.04936, 2019.
[26]	X. Chen and Y. Tian, “Learning to perform local rewriting for combinatorial optimization,” in Proc. the 33rd Int. Conf. Neural Information Processing Systems. Vancouver, Canada, 2019, pp. 6281−6292.
[27]	L. Gao, M. Chen, Q. Chen, G. Luo, N. Zhu, and Z. Liu, “Learn to design the heuristics for vehicle routing problem,” arXiv preprint arXiv: 2002.08539, 2020.
[28]	Y. Wu, W. Song, Z. Cao, J. Zhang, and A. Lim, “Learning improvement heuristics for solving routing problems,” IEEE Trans. Neural Networks and Learning Systems, vol. 33, no. 9, pp. 5057–5069, 2022.
[29]	Z. Zheng, S. Yao, G. Li, L. Han, and Z. Wang, “Pareto improver: Learning improvement heuristics for multi-objective route planning,” IEEE Trans. Intelligent Transportation Systems, vol. 25, no. 1, pp. 1033–1043, 2024.
[30]	J. Sun, X. Liu, T. Bäck, and Z. Xu, “Learning adaptive differential evolution algorithm from optimization experiences by policy gradient,” IEEE Trans. Evolutionary Computation, vol. 25, no. 4, pp. 666–680, 2021. doi: 10.1109/TEVC.2021.3060811
[31]	A. Draa, S. Bouzoubia, and I. Boukhalfa, “A sinusoidal differential evolution algorithm for numerical optimisation,” Applied Soft Computing, vol. 27, pp. 99–126, 2015.
[32]	S. Das, A. Konar, and U. K. Chakraborty, “Two improved differential evolution schemes for faster global search,” in Proc. the 7th Annual Conf. on Genetic and Evolutionary Computation, 2005, pp. 991–998.
[33]	R. Tanabe and A. S. Fukunaga, “Improving the search performance of SHADE using linear population size reduction,” in Proc. the IEEE Congress on Evolutionary Computation. Beijing, China, 2014, pp. 1658–1665.
[34]	J. Brest, S. Greiner, B. Boskovic, M. Mernik, and V. Zumer, “Self-adapting control parameters in differential evolution: A comparative study on numerical benchmark problems,” IEEE Trans. Evolutionary Computation, vol. 10, no. 6, pp. 646–657, 2006.
[35]	A. K. Qin and P. N. Suganthan, “Self-adaptive differential evolution algorithm for numerical optimization,” in Proc. the IEEE Congress on Evolutionary Computation, Edinburgh, UK, 2005, pp. 1785–1791.
[36]	J. Zhang and A. C. Sanderson, “JADE: Adaptive differential evolution with optional external archive,” IEEE Trans. Evolutionary Computation, vol. 13, no. 5, pp. 945–958, 2009.
[37]	K. M. Sallam, S. M. Elsayed, R. K. Chakrabortty, and M. J. Ryan, “Improved multi-operator differential evolution algorithm for solving unconstrained problems,” in Proc. the IEEE Congress on Evolutionary Computation, Glasgow, UK, 2020, pp. 1–8.
[38]	R. Tanabe and A. Fukunaga, “Success-history based parameter adaptation for differential evolution,” in Proc. the IEEE Congress on Evolutionary Computation, Cancun, Mexico, 2013, pp. 71–78.
[39]	F. Zhao, F. Ji, T. Xu, N. Zhu, and Jonrinaldi, “Hierarchical parallel search with automatic parameter configuration for particle swarm optimization,” Applied Soft Computing, vol. 151, Art. no. 111126, 2024.
[40]	G. Karafotias, A. E. Eiben, and M. Hoogendoorn, “Generic parameter control with reinforcement learning,” in Proc. the Annual Conf. on Genetic and Evolutionary Computation, 2014, pp. 1319–1326.
[41]	H. Zhang, J. Sun, K. C. Tan, and Z. Xu, “Learning adaptive differential evolution by natural evolution strategies,” IEEE Trans. Emerging Topics in Computational Intelligence, vol. 7, no. 3, pp. 872–886, 2023.
[42]	Y. Liu, H. Lu, S. Cheng, and Y. Shi, “An adaptive online parameter control algorithm for particle swarm optimization based on reinforcement learning,” in Proc. the IEEE Congress on Evolutionary Computation, Wellington, New Zealand, 2019, pp. 815–822.
[43]	R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 2018.
[44]	R. Tinós, “Artificial neural network based crossover for evolutionary algorithms,” Applied Soft Computing, vol. 95, Art. no. 106512, 2020.
[45]	C. He, S. Huang, R. Cheng, K. C. Tan, and Y. Jin, “Evolutionary multiobjective optimization driven by generative adversarial networks (GANs),” IEEE Trans. Cybernetics, vol. 51, no. 6, pp. 3129–3142, 2021.
[46]	J. Kudela, “A critical problem in benchmarking and analysis of evolutionary computation methods,” Nature Machine Intelligence, vol. 4, pp. 1238–1245, 2022.
[47]	K. Sörensen, “Metaheuristics—The metaphor exposed,” Int. Trans. in Operational Research, vol. 22, no. 1, pp. 3–18, 2015.
[48]	N. Hansen, R. Ros, N. Mauny, M. Schoenauer, and A. Auger, “Impacts of invariance in search: When CMA-ES and PSO face ill-conditioned and non-separable problems,” Applied Soft Computing, vol. 11, no. 8, pp. 5755–5769, 2011.
[49]	R. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory,” in Proc. the 6th Int. Symposium on Micro Machine and Human Science, Nagoya, Japan, 1995, pp. 39–43.
[50]	J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv: 1707.06347v2, 2017.
[51]	K. Deb and M. Goyal, “A combined genetic adaptive search (GeneAS) for engineering design,” Computer Science and Informatics, vol. 26, no. 4, pp. 30–45, 1996.
[52]	K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Trans. Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002.
[53]	X. Chu, F. Cai, C. Cui, M. Hu, L. Li, and Q. Qin, “Adaptive recommendation model using meta-learning for population-based algorithms,” Information Sciences, vol. 476, pp. 192–210, 2019.
[54]	M. Sharma, A. Komninos, M. López-Ibáñez, and D. Kazakov, “Deep reinforcement learning based parameter control in differential evolution,” in Proc. the Genetic and Evolutionary Computation Conf., 2019, pp. 709–717.
[55]	S. Zhao, T. Zhang, S. Ma, and M. Chen, “Dandelion optimizer: A nature-inspired metaheuristic algorithm for engineering applications,” Engineering Applications of Artificial Intelligence, vol. 114, Art. no. 105075, 2022.
[56]	B. Abdollahzadeh, F. S. Gharehchopogh, N. Khodadadi, and S. Mirjalili, “Mountain gazelle optimizer: A new nature-inspired metaheuristic algorithm for global optimization problems,” Advances in Engineering Software, vol. 174, Art. no. 103282, 2022.
[57]	Y. Tian, R. Cheng, X. Zhang, and Y. Jin, “PlatEMO: A MATLAB platform for evolutionary multi-objective optimization[educational forum],” IEEE Computational Intelligence Magazine, vol. 12, no. 4, pp. 73–87, 2017.
[58]	X. Yao, Y. Liu, and G. Lin, “Evolutionary programming made faster,” IEEE Trans. Evolutionary Computation, vol. 3, no. 2, pp. 82–102, 1999.
[59]	J. Derrac, S. García, D. Molina, and F. Herrera, “A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms,” Swarm and Evolutionary Computation, vol. 1, no. 1, pp. 3–18, 2011. doi: 10.1016/j.swevo.2011.02.002
[60]	Y. Tian, X. Zheng, X. Zhang, and Y. Jin, “Efficient large-scale multiobjective optimization based on a competitive swarm optimizer,” IEEE Trans. Cybernetics, vol. 50, no. 8, pp. 3696–3708, 2020. doi: 10.1109/TCYB.2019.2906383
[61]	Y. Tian, H. Chen, H. Ma, X. Zhang, K. C. Tan, and Y. Jin, “Integrating conjugate gradients into evolutionary algorithms for large-scale continuous multi-objective optimization,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 10, pp. 1801–1817, 2022. doi: 10.1109/JAS.2022.105875
[62]	Y. Yuan, X. Luo, M. Shang, and Z. Wang, “A Kalman-filter-incorporated latent factor analysis model for temporally dynamic sparse data,” IEEE Trans. Cybernetics, vol. 53, no. 9, pp. 5788–5801, 2023. doi: 10.1109/TCYB.2022.3185117
[63]	J. Li, X. Luo, Y. Yuan, and S. Gao, “A nonlinear PID-incorporated adaptive stochastic gradient descent algorithm for latent factor analysis,” IEEE Trans. Autom. Science and Engineering, vol. 21, no. 3, pp. 3742–3756, 2024. doi: 10.1109/TASE.2023.3284819
[64]	Y. Yuan, J. Li, and X. Luo, “A fuzzy PID-incorporated stochastic gradient descent algorithm for fast and accurate latent factor analysis,” IEEE Trans. Fuzzy Systems, vol. 32, no. 7, pp. 4049–4061, 2024. doi: 10.1109/TFUZZ.2024.3389733
[65]	X. Xiang, Y. Tian, J. Xiao, and X. Zhang, “A clustering-based surrogate-assisted multiobjective evolutionary algorithm for shelter location under uncertainty of road networks,” IEEE Trans. Industrial Informatics, vol. 16, no. 12, pp. 7544–7555, 2020. doi: 10.1109/TII.2019.2962137

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(3) / Tables(7)

Get Citation

PDF

XML

Article Metrics

Article views (25) PDF downloads(1)

Deep Reinforcement Learning Based on Search Space Independent Operators for Black-Box Continuous Optimization

doi: 10.1109/JAS.2025.125444

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content