A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 10 Issue 9
Sep.  2023

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Y. L. Gong, J. H. Zhou, Q. W. Wu, M. C. Zhou, and J. H. Wen, “A length-adaptive non-dominated sorting genetic algorithm for bi-objective high-dimensional feature selection,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 9, pp. 1834–1844, Sept. 2023. doi: 10.1109/JAS.2023.123648
Citation: Y. L. Gong, J. H. Zhou, Q. W. Wu, M. C. Zhou, and J. H. Wen, “A length-adaptive non-dominated sorting genetic algorithm for bi-objective high-dimensional feature selection,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 9, pp. 1834–1844, Sept. 2023. doi: 10.1109/JAS.2023.123648

A Length-Adaptive Non-Dominated Sorting Genetic Algorithm for Bi-Objective High-Dimensional Feature Selection

doi: 10.1109/JAS.2023.123648
Funds:  This work was supported in part by the National Natural Science Foundation of China (62172065, 62072060)
More Information
  • As a crucial data preprocessing method in data mining, feature selection (FS) can be regarded as a bi-objective optimization problem that aims to maximize classification accuracy and minimize the number of selected features. Evolutionary computing (EC) is promising for FS owing to its powerful search capability. However, in traditional EC-based methods, feature subsets are represented via a length-fixed individual encoding. It is ineffective for high-dimensional data, because it results in a huge search space and prohibitive training time. This work proposes a length-adaptive non-dominated sorting genetic algorithm (LA-NSGA) with a length-variable individual encoding and a length-adaptive evolution mechanism for bi-objective high-dimensional FS. In LA-NSGA, an initialization method based on correlation and redundancy is devised to initialize individuals of diverse lengths, and a Pareto dominance-based length change operator is introduced to guide individuals to explore in promising search space adaptively. Moreover, a dominance-based local search method is employed for further improvement. The experimental results based on 12 high-dimensional gene datasets show that the Pareto front of feature subsets produced by LA-NSGA is superior to those of existing algorithms.

     

  • loading
  • [1]
    J. Li, K. Cheng, S. Wang, et al., “Feature selection: A data perspective,” ACM Computing Surveys, vol. 50, no. 6, pp. 1–45, 2017.
    [2]
    G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Computers &Electrical Engineering, vol. 40, no. 1, pp. 16–28, 2014.
    [3]
    H. Liu, M. Zhou, and Q. Liu, “An embedded feature selection method for imbalanced data classification,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 3, pp. 703–715, 2019. doi: 10.1109/JAS.2019.1911447
    [4]
    K. Kira and L. A. Rendell, “A practical approach to feature selection,” in Machine Learning, Amsterdam, The Netherlands: Elsevier, 1992, pp. 249–256.
    [5]
    J. Reunanen, “Overfitting in making comparisons between variable selection methods,” J. Machine Learning Research, vol. 3, no. 3, pp. 1371–1382, 2003.
    [6]
    H. Chen, et al., “Robust decision trees against adversarial examples,” in Proc. Inter. Conf. Machine Learning, 2019, pp. 1122–1131.
    [7]
    X. Luo, X. Wen, M. Zhou, et al., “Decision-tree-initialized dendritic neuron model for fast and accurate data classification,” IEEE Trans. Neural Networks Learning Syst., vol. 33, no. 9, pp. 4173–4183, 2022. doi: 10.1109/TNNLS.2021.3055991
    [8]
    R. Tibshirani, “Regression shrinkage and selection via the Lasso,” J. Royal Statistical Society: Series B, vol. 58, no. 1, pp. 267–288, 1996.
    [9]
    T. M. Hamdani, J. M. Won, A. M. Alimi, et al., “Multi-objective feature selection with NSGA II,” in Proc. Inter. Conf. Adaptive Natural Computing Algorithms, 2007, pp. 240–247.
    [10]
    S. Han, K. Zhu, M. Zhou, et al., “Competition-driven multimodal multiobjective optimization and its application to feature selection for credit card fraud detection,” IEEE Trans. Syst.,Man,Cybe.: Syst., vol. 52, no. 12, pp. 7845–7857, 2022. doi: 10.1109/TSMC.2022.3171549
    [11]
    Z. Wang, S. Gao, M. Zhou, et al., “Information-theory-based nondominated sorting ant colony optimization for multiobjective feature selection in classification,” IEEE Trans. Cyber., 2022. DOI: 10.1109/TCYB.2022.3185554.
    [12]
    K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Trans. Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002. doi: 10.1109/4235.996017
    [13]
    B. H. Nguyen, B. Xue, and M. Zhang, “A survey on swarm intelligence approaches to feature selection in data mining,” Swarm Evolutionary Computation, vol. 54, p. 100663, 2020. doi: 10.1016/j.swevo.2020.100663
    [14]
    V. Bolón-Canedo, N. Sánchez-Maroño, and A. Alonso-Betanzos, Feature Selection for High-Dimensional Data. Cham, Switzerland: Springer, 2015.
    [15]
    L. Yu and H. Liu, “Feature selection for high-dimensional data: A fast correlation-based filter solution,” in Proc. 20th Inter. Conf. Machine Learning, 2003, pp. 856–863.
    [16]
    Y. Sun, S. Todorovic, and S. Goodison, “Local-learning-based feature selection for high-dimensional data analysis,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 32, no. 9, pp. 1610–1626, 2010. doi: 10.1109/TPAMI.2009.190
    [17]
    A. Bommert, et al., “Benchmark for filter methods for feature selection in high-dimensional classification data,” Computational Statistics &Data Analysis, vol. 143, p. 106839, 2020.
    [18]
    J. Lee, I. Y. Choi, and C.-H. Jun, “An efficient multivariate feature ranking method for gene selection in high-dimensional microarray data,” Expert Syst. Applications, vol. 166, p. 113971, 2021. doi: 10.1016/j.eswa.2020.113971
    [19]
    M. García-Torres, F. Gómez-Vela, B. Melián-Batista, et al., “High-dimensional feature selection via feature grouping: A variable neighborhood search approach,” Information Sciences, vol. 326, pp. 102–118, 2016. doi: 10.1016/j.ins.2015.07.041
    [20]
    S. Gu, R. Cheng, and Y. Jin, “Feature selection for high-dimensional classification using a competitive swarm optimizer,” Soft Computing, vol. 22, no. 3, pp. 811–822, 2018. doi: 10.1007/s00500-016-2385-6
    [21]
    W. Ma, X. Zhou, H. Zhu, et al., “A two-stage hybrid ant colony optimization for high-dimensional feature selection,” Pattern Recognition, vol. 116, p. 107933, 2021. doi: 10.1016/j.patcog.2021.107933
    [22]
    B. Tran, B. Xue, and M. Zhang, “Variable-length particle swarm optimization for feature selection on high-dimensional classification,” IEEE Trans. Evolutionary Computation, vol. 23, no. 3, pp. 473–487, 2019. doi: 10.1109/TEVC.2018.2869405
    [23]
    N. D. Cilia, C. De Stefano, F. Fontanella, et al., “Variable-length representation for EC-based feature selection in high-dimensional data,” in Proc. Int. Conf. Applications Evolutionary Computation (Part of EvoStar), 2019, pp. 325–340.
    [24]
    J. Zhou, Q. Wu, M. C. Zhou, et al., “LAGAM: A length-adaptive genetic algorithm with Markov blanket for high-dimensional feature selection in classification,” IEEE Trans. Cybernetics, 2023. DOI: 10.1109/TCYB.2022.3163577.
    [25]
    M. Labani, P. Moradi, and M. Jalili, “A multi-objective genetic algorithm for text feature selection using the relative discriminative criterion,” Expert Systems Applications, vol. 149, p. 113276, 2020. doi: 10.1016/j.eswa.2020.113276
    [26]
    A.-D. Li, B. Xue, and M. Zhang, “Multi-objective feature selection using hybridization of a genetic algorithm and direct multisearch for key quality characteristic selection,” Information Sciences, vol. 523, pp. 245–265, 2020. doi: 10.1016/j.ins.2020.03.032
    [27]
    Y. Xue, H. Zhu, J. Liang, et al., “Adaptive crossover operator based multi-objective binary genetic algorithm for feature selection in classification,” Knowledge-Based Systems, vol. 227, p. 107218, 2021. doi: 10.1016/j.knosys.2021.107218
    [28]
    Y. Zhou, W. Zhang, J. Kang, et al., “A problem-specific non-dominated sorting genetic algorithm for supervised feature selection,” Information Sciences, vol. 547, pp. 841–859, 2021. doi: 10.1016/j.ins.2020.08.083
    [29]
    H. Xu, B. Xue, and M. Zhang, “A duplication analysis-based evolutionary algorithm for biobjective feature selection,” IEEE Trans. Evolutionary Computation, vol. 25, no. 2, pp. 205–218, 2021. doi: 10.1109/TEVC.2020.3016049
    [30]
    B. Xue, M. Zhang, and W. N. Browne, “Particle swarm optimization for feature selection in classification: A multi-objective approach,” IEEE Trans. Cyber., vol. 43, no. 6, pp. 1656–1671, 2013. doi: 10.1109/TSMCB.2012.2227469
    [31]
    Y. Zhang, D.-W. Gong, and J. Cheng, “Multi-objective particle swarm optimization approach for cost-based feature selection in classification,” IEEE/ACM Trans. Computational Biology Bioinformatics, vol. 14, no. 1, pp. 64–75, 2017. doi: 10.1109/TCBB.2015.2476796
    [32]
    M. Amoozegar and B. Minaei-Bidgoli, “Optimizing multi-objective PSO based feature selection method using a feature elitism mechanism,” Expert Systems Applications, vol. 113, pp. 499–514, 2018. doi: 10.1016/j.eswa.2018.07.013
    [33]
    Y. Zhou, J. Kang, S. Kwong, et al., “An evolutionary multi-objective optimization framework of discretization-based feature selection for classification,” Swarm Evolutionary Computation, vol. 60, p. 100770, 2021. doi: 10.1016/j.swevo.2020.100770
    [34]
    A. Rashno, M. Shafipour, and S. Fadaei, “Particle ranking: An efficient method for multi-objective particle swarm optimization feature selection,” Knowledge-Based Systems, vol. 245, p. 108640, 2022. doi: 10.1016/j.knosys.2022.108640
    [35]
    Y. Zhang, D. Gong, X. Gao, et al., “Binary differential evolution with self-learning for multi-objective feature selection,” Information Sciences, vol. 507, pp. 67–85, 2020. doi: 10.1016/j.ins.2019.08.040
    [36]
    U. Mlakar, I. Fister, J. Brest, et al., “Multi-objective differential evolution for feature selection in facial expression recognition systems,” Expert Systems Applications, vol. 89, pp. 129–137, 2017. doi: 10.1016/j.eswa.2017.07.037
    [37]
    X.-H. Wang, Y. Zhang, X. Y. Sun, et al., “Multi-objective feature selection based on artificial bee colony: An acceleration approach with variable sample size,” Applied Soft Computing, vol. 88, p. 106041, 2020. doi: 10.1016/j.asoc.2019.106041
    [38]
    E. Hancer, B. Xue, M. Zhang, et al., “Pareto front feature selection based on artificial bee colony optimization,” Information Sciences, vol. 422, pp. 462–479, 2018. doi: 10.1016/j.ins.2017.09.028
    [39]
    I. Aljarah, M. Habib, H. Faris, et al., “A dynamic locality multi-objective salp swarm algorithm for feature selection,” Computers &Industrial Engineering, vol. 147, p. 106628, 2020.
    [40]
    E. F. Ohata, G. M. Bezerra, J. V. S. das Chagas, et al., “Automatic detection of COVID-19 infection using chest X-ray images through transfer learning,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 1, pp. 239–248, 2021. doi: 10.1109/JAS.2020.1003393
    [41]
    L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Machine Learning Research, vol. 9, no. 11, pp. 2579–2605, 2008.
    [42]
    K. Shang, H. Ishibuchi, L. He, et al., “A survey on the hypervolume indicator in evolutionary multiobjective optimization,” IEEE Trans. Evolutionary Computation, vol. 25, no. 1, pp. 1–20, 2021. doi: 10.1109/TEVC.2020.3013290
    [43]
    F. Wilcoxon, “Individual comparisons by ranking methods,” in Breakthroughs Statistics, New York, USA: Springer, 1992, pp. 196–202.
    [44]
    Y. Zhang, G. G. Wang, K. Li, et al., “Enhancing MOEA/D with information feedback models for large-scale many-objective optimization,” Information Sciences, vol. 522, pp. 1–16, 2020. doi: 10.1016/j.ins.2020.02.066
    [45]
    S. Han, K. Zhu, M. C. Zhou, et al., “A novel multiobjective fireworks algorithm and its applications to imbalanced distance minimization problems,” IEEE/CAA J. Automa. Sinica, vol. 9, no. 8, pp. 1476–1489, 2022. doi: 10.1109/JAS.2022.105752
    [46]
    Q. Fan and O. K. Ersoy, “Zoning search with adaptive resource allocating method for balanced and imbalanced multimodal multi-objective optimization,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 6, pp. 1163–1176, 2021. doi: 10.1109/JAS.2021.1004027
    [47]
    Q. Kang, X. Song, M. C. Zhou, et al., “A collaborative resource allocation strategy for decomposition-based multiobjective evolutionary algorithms,” IEEE Trans. Syst.,Man,Cybernetics: Syst., vol. 49, no. 12, pp. 2416–2423, 2018.
    [48]
    X. Zhu and M. Zhou, “Multiobjective optimized cloudlet deployment and task offloading for mobile-edge computing,” IEEE Internet Things J., vol. 8, no. 20, pp. 15582–15595, 2021. doi: 10.1109/JIOT.2021.3073113
    [49]
    M. Cui, L. Li, M. Zhou, et al., “Surrogate-assisted autoencoder-embedded evolutionary optimization algorithm to solve high-dimensional expensive problems,” IEEE Trans. on Evolutionary Computation, vol. 26, no. 4, pp. 676–689, 2022. doi: 10.1109/TEVC.2021.3113923
    [50]
    Z. Lei, S. Gao, Z. Zhang, et al., “MO4: A many-objective evolutionary algorithm for protein structure prediction,” IEEE Trans. Evolutionary Computation, vol. 26, no. 3, pp. 417–430, 2022. doi: 10.1109/TEVC.2021.3095481
    [51]
    H. Li, B. Wang, Y. Yuan, et al., “Scoring and dynamic hierarchy-based NSGA-II for multiobjective workflow scheduling in the cloud,” IEEE Trans. Autom. Science Engineering, vol. 19, no. 2, pp. 982–993, 2022. doi: 10.1109/TASE.2021.3054501
    [52]
    M. Cui, et al, “A bi-population cooperative optimization algorithm assisted by an autoencoder for medium-scale expensive problems,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 11, pp. 1952–1966, 2022.
    [53]
    Y. Zhou, W. Xu, M. Zhou, and Z.-H. Fu, “Bi-Trajectory Hybrid Search to Solve Bottleneck-Minimized Colored Traveling Salesman Problems,” IEEE Trans. Autom. Science Engineering, 2023. DOI: 10.1109/TASE.2023.3236317

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(7)  / Tables(4)

    Article Metrics

    Article views (457) PDF downloads(131) Cited by()

    Highlights

    • A bi-objective high-dimensional feature selection method called LA-NSGA is proposed
    • Length-variable individual encoding and length-adaptive evolution mechanism are used
    • Experimental results based on 12 gene datasets verify the superiority of LA-NSGA

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return