A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 13 Issue 4
Apr.  2026

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 19.2, Top 1 (SCI Q1)
    CiteScore: 28.2, Top 1% (Q1)
    Google Scholar h5-index: 95, TOP 5
Turn off MathJax
Article Contents
Y. Tian, Y. Liu, S. Yang, and X. Zhang, “Deep reinforcement learning based on search space independent operators for black-box continuous optimization,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 4, pp. 913–925, Apr. 2026. doi: 10.1109/JAS.2025.125444
Citation: Y. Tian, Y. Liu, S. Yang, and X. Zhang, “Deep reinforcement learning based on search space independent operators for black-box continuous optimization,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 4, pp. 913–925, Apr. 2026. doi: 10.1109/JAS.2025.125444

Deep Reinforcement Learning Based on Search Space Independent Operators for Black-Box Continuous Optimization

doi: 10.1109/JAS.2025.125444
Funds:  This work was supported in part by the National Natural Science Foundation of China (62136008, 62276001, U21A20512, W2441019), the Anhui Provincial Natural Science Foundation (2308085J03), and the Excellent Youth Foundation of Anhui Provincial Colleges (2022AH030013)
More Information
  • Deep reinforcement learning (DRL) has demonstrated exceptional capabilities in combinatorial optimization, which automatically devises policies for solution construction and optimizer refinement. DRL is particularly adept in generating training samples by itself, thereby providing the flexibility to solve a variety of combinatorial optimization problems without supervision. While DRL takes actions according to states extracted from problem-specific information, it cannot be directly applied to black-box continuous optimization lacking explicit information. To address this issue, this paper proposes a search space independent operator based DRL method for black-box continuous optimization. It conceptualizes the optimization process driven by search space independent operators as a Markov decision process, wherein actions are defined as operators and states are extracted from solutions generated by operators. In contrast to other DRL-assisted metaheuristics, the proposed method does not rely on any existing metaheuristic. Instead, it innovates by creating totally new operators, able to surpass the performance boundaries of existing metaheuristics. Compared with state-of-the-art metaheuristics and DRL methods, the proposed method shows significantly faster convergence speed on challenging continuous optimization problems.

     

  • loading
  • [1]
    Y. Tian, Y. Feng, C. Wang, R. Cao, X. Zhang, X. Pei, K. C. Tan, and Y. Jin, “A large-scale combinatorial many-objective evolutionary algorithm for intensity-modulated radiotherapy planning,” IEEE Trans. Evolutionary Computation, vol. 26, no. 6, pp. 1511–1525, 2022. doi: 10.1109/TEVC.2022.3144675
    [2]
    L. M. Ochoa-Estopier and M. Jobson, “Optimization of heat-integrated crude oil distillation systems. part ⅰ: The distillation model,” Industrial and Engineering Chemistry Research, vol. 54, no. 18, pp. 4988–5000, 2015.
    [3]
    C. He, R. Cheng, C. Zhang, Y. Tian, Q. Chen, and X. Yao, “Evolutionary large-scale multiobjective optimization for ratio error estimation of voltage transformers,” IEEE Trans. Evolutionary Computation, vol. 24, no. 5, pp. 868–881, 2020. doi: 10.1109/TEVC.2020.2967501
    [4]
    K. G. Murty, Linear Programming. Hoboken, USA: John Wiley & Sons, 1983.
    [5]
    M. Li, “Generalized Lagrange multiplier method and KKT conditions with an application to distributed optimization,” IEEE Trans. Circuits and Systems II: Express Briefs, vol. 66, no. 2, pp. 252–256, 2019. doi: 10.1109/TCSII.2018.2842085
    [6]
    S. S. Petrova and A. D. Solov’ev, “The origin of the method of steepest descent,” Historia Mathematica, vol. 24, no. 4, pp. 361–375, 1997.
    [7]
    J. Mockus, Bayesian Approach to Global Optimization: Theory and Applications. Kluwer Academic, Dordrecht, the Netherlands: Springer, 1989.
    [8]
    Y. Tian, H. Chen, X. Xiang, H. Jiang, and X. Zhang, “A comparative study on evolutionary algorithms and mathematical programming methods for continuous optimization,” in Proc. the IEEE Congress on Evolutionary Computation, 2022.
    [9]
    D. H. Wolpert and W. G. Macready, “No free lunch theorems for optimization,” IEEE Trans. Evolutionary Computation, vol. 1, no. 1, pp. 67–82, 1997. doi: 10.1109/4235.585893
    [10]
    Y. Tian, S. Peng, X. Zhang, T. Rodemann, K. C. Tan, and Y. Jin, “A recommender system for metaheuristic algorithms for continuous optimization based on deep recurrent neural networks,” IEEE Trans. Artificial Intelligence, vol. 1, no. 1, pp. 5–18, 2020.
    [11]
    J. H. Holland, Adaptation in Natural and Artificial Systems. Cambridge, MA: MIT Press, 1992.
    [12]
    R. Storn and K. Price, “Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces,” J. Global Optimization, vol. 11, no. 4, pp. 341−359, 1997. doi: 10.1023/A:1008202821328
    [13]
    N. Hansen and A. Ostermeier, “Completely derandomized self-adaptation in evolution strategies,” Evolutionary Computation, vol. 9, no. 2, pp. 159–195, 2001. doi: 10.1162/106365601750190398
    [14]
    Y. Tian, X. Li, H. Ma, X. Zhang, K. C. Tan, and Y. Jin, “Deep reinforcement learning based adaptive operator selection for evolutionary multi-objective optimization,” IEEE Trans. Emerging Topics in Computational Intelligence, vol. 7, no. 4, pp. 1051–1064, 2023.
    [15]
    H. Tong, S. Zhang, C. Huang, and X. Yao, “Algorithm portfolio for parameter tuned evolutionary algorithms,” in Proc. the IEEE Symposium Series on Computational Intelligence, 2019, pp. 1849–1856.
    [16]
    N. Mazyavkina, S. Sviridov, S. Ivanov, and E. Burnaev, “Reinforcement learning for combinatorial optimization: A survey,” Computers and Operations Research, vol. 134, Art. no. 105400, 2021.
    [17]
    L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement learning: A survey,” J. Artificial Intelligence Research, no. 4, pp. 237–285, 1996.
    [18]
    H. A. Nomer, A. W. Mohamed, and A. H. Yousef, “GSK-RL: Adaptive gaining-sharing knowledge algorithm using reinforcement learning,” in Proc. the 3rd Novel Intelligent and Leading Emerging Sciences Conf., Giza, Egypt, 2021, pp. 169–174.
    [19]
    Y. Tian, X. Zhang, C. He, K. C. Tan, and Y. Jin, “Principled design of translation, scale, and rotation invariant variation operators for metaheuristics,” Chinese J. Electronics, vol. 32, no. 1, pp. 111–129, 2023.
    [20]
    N. Agatz, P. Bouman, and M. Schmidt, “Optimization approaches for the traveling salesman problem with drone,” Transportation Science, vol 52, no. 4, pp. 965−981, 2018.
    [21]
    G. Xia, Z. Tang, J. Wang, R. Wang, Y. Li, and G. Xia, “A new parallel improvement algorithm for maximum cut problem,” in Advances in Neural Networks – ISNN 2004, vol. 3173, 2004, pp. 419–424.
    [22]
    I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio, “Neural combinatorial optimization with reinforcement learning,” arXiv preprint arXiv: 1611.09940, 2016.
    [23]
    M. Deudon, P. Cournut, A. Lacoste, Y. Adulyasak, and L.-M. Rousseau, “Learning heuristics for the TSP by policy gradient,” in Integration of Constraint Programming, Artificial Intelligence, and Operations Research. Springer, 2018, pp. 170–181.
    [24]
    W. Kool, H. Van Hoof, and M. Welling, “Attention, learn to solve routing problems!” arXiv preprint arXiv: 1803.08475, 2018.
    [25]
    Q. Ma, S. Ge, D. He, D. Thaker, and I. Drori, “Combinatorial optimization by graph pointer networks and hierarchical reinforcement learning,” arXiv preprint arXiv: 1911.04936, 2019.
    [26]
    X. Chen and Y. Tian, “Learning to perform local rewriting for combinatorial optimization,” in Proc. the 33rd Int. Conf. Neural Information Processing Systems. Vancouver, Canada, 2019, pp. 6281−6292.
    [27]
    L. Gao, M. Chen, Q. Chen, G. Luo, N. Zhu, and Z. Liu, “Learn to design the heuristics for vehicle routing problem,” arXiv preprint arXiv: 2002.08539, 2020.
    [28]
    Y. Wu, W. Song, Z. Cao, J. Zhang, and A. Lim, “Learning improvement heuristics for solving routing problems,” IEEE Trans. Neural Networks and Learning Systems, vol. 33, no. 9, pp. 5057–5069, 2022.
    [29]
    Z. Zheng, S. Yao, G. Li, L. Han, and Z. Wang, “Pareto improver: Learning improvement heuristics for multi-objective route planning,” IEEE Trans. Intelligent Transportation Systems, vol. 25, no. 1, pp. 1033–1043, 2024.
    [30]
    J. Sun, X. Liu, T. Bäck, and Z. Xu, “Learning adaptive differential evolution algorithm from optimization experiences by policy gradient,” IEEE Trans. Evolutionary Computation, vol. 25, no. 4, pp. 666–680, 2021. doi: 10.1109/TEVC.2021.3060811
    [31]
    A. Draa, S. Bouzoubia, and I. Boukhalfa, “A sinusoidal differential evolution algorithm for numerical optimisation,” Applied Soft Computing, vol. 27, pp. 99–126, 2015.
    [32]
    S. Das, A. Konar, and U. K. Chakraborty, “Two improved differential evolution schemes for faster global search,” in Proc. the 7th Annual Conf. on Genetic and Evolutionary Computation, 2005, pp. 991–998.
    [33]
    R. Tanabe and A. S. Fukunaga, “Improving the search performance of SHADE using linear population size reduction,” in Proc. the IEEE Congress on Evolutionary Computation. Beijing, China, 2014, pp. 1658–1665.
    [34]
    J. Brest, S. Greiner, B. Boskovic, M. Mernik, and V. Zumer, “Self-adapting control parameters in differential evolution: A comparative study on numerical benchmark problems,” IEEE Trans. Evolutionary Computation, vol. 10, no. 6, pp. 646–657, 2006.
    [35]
    A. K. Qin and P. N. Suganthan, “Self-adaptive differential evolution algorithm for numerical optimization,” in Proc. the IEEE Congress on Evolutionary Computation, Edinburgh, UK, 2005, pp. 1785–1791.
    [36]
    J. Zhang and A. C. Sanderson, “JADE: Adaptive differential evolution with optional external archive,” IEEE Trans. Evolutionary Computation, vol. 13, no. 5, pp. 945–958, 2009.
    [37]
    K. M. Sallam, S. M. Elsayed, R. K. Chakrabortty, and M. J. Ryan, “Improved multi-operator differential evolution algorithm for solving unconstrained problems,” in Proc. the IEEE Congress on Evolutionary Computation, Glasgow, UK, 2020, pp. 1–8.
    [38]
    R. Tanabe and A. Fukunaga, “Success-history based parameter adaptation for differential evolution,” in Proc. the IEEE Congress on Evolutionary Computation, Cancun, Mexico, 2013, pp. 71–78.
    [39]
    F. Zhao, F. Ji, T. Xu, N. Zhu, and Jonrinaldi, “Hierarchical parallel search with automatic parameter configuration for particle swarm optimization,” Applied Soft Computing, vol. 151, Art. no. 111126, 2024.
    [40]
    G. Karafotias, A. E. Eiben, and M. Hoogendoorn, “Generic parameter control with reinforcement learning,” in Proc. the Annual Conf. on Genetic and Evolutionary Computation, 2014, pp. 1319–1326.
    [41]
    H. Zhang, J. Sun, K. C. Tan, and Z. Xu, “Learning adaptive differential evolution by natural evolution strategies,” IEEE Trans. Emerging Topics in Computational Intelligence, vol. 7, no. 3, pp. 872–886, 2023.
    [42]
    Y. Liu, H. Lu, S. Cheng, and Y. Shi, “An adaptive online parameter control algorithm for particle swarm optimization based on reinforcement learning,” in Proc. the IEEE Congress on Evolutionary Computation, Wellington, New Zealand, 2019, pp. 815–822.
    [43]
    R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 2018.
    [44]
    R. Tinós, “Artificial neural network based crossover for evolutionary algorithms,” Applied Soft Computing, vol. 95, Art. no. 106512, 2020.
    [45]
    C. He, S. Huang, R. Cheng, K. C. Tan, and Y. Jin, “Evolutionary multiobjective optimization driven by generative adversarial networks (GANs),” IEEE Trans. Cybernetics, vol. 51, no. 6, pp. 3129–3142, 2021.
    [46]
    J. Kudela, “A critical problem in benchmarking and analysis of evolutionary computation methods,” Nature Machine Intelligence, vol. 4, pp. 1238–1245, 2022.
    [47]
    K. Sörensen, “Metaheuristics—The metaphor exposed,” Int. Trans. in Operational Research, vol. 22, no. 1, pp. 3–18, 2015.
    [48]
    N. Hansen, R. Ros, N. Mauny, M. Schoenauer, and A. Auger, “Impacts of invariance in search: When CMA-ES and PSO face ill-conditioned and non-separable problems,” Applied Soft Computing, vol. 11, no. 8, pp. 5755–5769, 2011.
    [49]
    R. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory,” in Proc. the 6th Int. Symposium on Micro Machine and Human Science, Nagoya, Japan, 1995, pp. 39–43.
    [50]
    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv: 1707.06347v2, 2017.
    [51]
    K. Deb and M. Goyal, “A combined genetic adaptive search (GeneAS) for engineering design,” Computer Science and Informatics, vol. 26, no. 4, pp. 30–45, 1996.
    [52]
    K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Trans. Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002.
    [53]
    X. Chu, F. Cai, C. Cui, M. Hu, L. Li, and Q. Qin, “Adaptive recommendation model using meta-learning for population-based algorithms,” Information Sciences, vol. 476, pp. 192–210, 2019.
    [54]
    M. Sharma, A. Komninos, M. López-Ibáñez, and D. Kazakov, “Deep reinforcement learning based parameter control in differential evolution,” in Proc. the Genetic and Evolutionary Computation Conf., 2019, pp. 709–717.
    [55]
    S. Zhao, T. Zhang, S. Ma, and M. Chen, “Dandelion optimizer: A nature-inspired metaheuristic algorithm for engineering applications,” Engineering Applications of Artificial Intelligence, vol. 114, Art. no. 105075, 2022.
    [56]
    B. Abdollahzadeh, F. S. Gharehchopogh, N. Khodadadi, and S. Mirjalili, “Mountain gazelle optimizer: A new nature-inspired metaheuristic algorithm for global optimization problems,” Advances in Engineering Software, vol. 174, Art. no. 103282, 2022.
    [57]
    Y. Tian, R. Cheng, X. Zhang, and Y. Jin, “PlatEMO: A MATLAB platform for evolutionary multi-objective optimization[educational forum],” IEEE Computational Intelligence Magazine, vol. 12, no. 4, pp. 73–87, 2017.
    [58]
    X. Yao, Y. Liu, and G. Lin, “Evolutionary programming made faster,” IEEE Trans. Evolutionary Computation, vol. 3, no. 2, pp. 82–102, 1999.
    [59]
    J. Derrac, S. García, D. Molina, and F. Herrera, “A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms,” Swarm and Evolutionary Computation, vol. 1, no. 1, pp. 3–18, 2011. doi: 10.1016/j.swevo.2011.02.002
    [60]
    Y. Tian, X. Zheng, X. Zhang, and Y. Jin, “Efficient large-scale multiobjective optimization based on a competitive swarm optimizer,” IEEE Trans. Cybernetics, vol. 50, no. 8, pp. 3696–3708, 2020. doi: 10.1109/TCYB.2019.2906383
    [61]
    Y. Tian, H. Chen, H. Ma, X. Zhang, K. C. Tan, and Y. Jin, “Integrating conjugate gradients into evolutionary algorithms for large-scale continuous multi-objective optimization,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 10, pp. 1801–1817, 2022. doi: 10.1109/JAS.2022.105875
    [62]
    Y. Yuan, X. Luo, M. Shang, and Z. Wang, “A Kalman-filter-incorporated latent factor analysis model for temporally dynamic sparse data,” IEEE Trans. Cybernetics, vol. 53, no. 9, pp. 5788–5801, 2023. doi: 10.1109/TCYB.2022.3185117
    [63]
    J. Li, X. Luo, Y. Yuan, and S. Gao, “A nonlinear PID-incorporated adaptive stochastic gradient descent algorithm for latent factor analysis,” IEEE Trans. Autom. Science and Engineering, vol. 21, no. 3, pp. 3742–3756, 2024. doi: 10.1109/TASE.2023.3284819
    [64]
    Y. Yuan, J. Li, and X. Luo, “A fuzzy PID-incorporated stochastic gradient descent algorithm for fast and accurate latent factor analysis,” IEEE Trans. Fuzzy Systems, vol. 32, no. 7, pp. 4049–4061, 2024. doi: 10.1109/TFUZZ.2024.3389733
    [65]
    X. Xiang, Y. Tian, J. Xiao, and X. Zhang, “A clustering-based surrogate-assisted multiobjective evolutionary algorithm for shelter location under uncertainty of road networks,” IEEE Trans. Industrial Informatics, vol. 16, no. 12, pp. 7544–7555, 2020. doi: 10.1109/TII.2019.2962137

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(3)  / Tables(7)

    Article Metrics

    Article views (17) PDF downloads(0) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return