A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 9 Issue 4
Apr.  2022

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Z. W. Zhang, S. T. Ye, Y. R. Zhang, W. P. Ding, and H. Wang, “Belief combination of classifiers for incomplete data,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 4, pp. 652–667, Apr. 2022. doi: 10.1109/JAS.2022.105458
Citation: Z. W. Zhang, S. T. Ye, Y. R. Zhang, W. P. Ding, and H. Wang, “Belief combination of classifiers for incomplete data,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 4, pp. 652–667, Apr. 2022. doi: 10.1109/JAS.2022.105458

Belief Combination of Classifiers for Incomplete Data

doi: 10.1109/JAS.2022.105458
Funds:  This work was supported in part by the Center-initiated Research Project and Research Initiation Project of Zhejiang Laboratory (113012-AL2201, 113012-PI2103), the National Natural Science Foundation of China (61300167, 61976120), the Natural Science Foundation of Jiangsu Province (BK20191445), the Natural Science Key Foundation of Jiangsu Education Department (21KJA510004), and Qing Lan Project of Jiangsu Province
More Information
  • Data with missing values, or incomplete information, brings some challenges to the development of classification, as the incompleteness may significantly affect the performance of classifiers. In this paper, we handle missing values in both training and test sets with uncertainty and imprecision reasoning by proposing a new belief combination of classifier (BCC) method based on the evidence theory. The proposed BCC method aims to improve the classification performance of incomplete data by characterizing the uncertainty and imprecision brought by incompleteness. In BCC, different attributes are regarded as independent sources, and the collection of each attribute is considered as a subset. Then, multiple classifiers are trained with each subset independently and allow each observed attribute to provide a sub-classification result for the query pattern. Finally, these sub-classification results with different weights (discounting factors) are used to provide supplementary information to jointly determine the final classes of query patterns. The weights consist of two aspects: global and local. The global weight calculated by an optimization function is employed to represent the reliability of each classifier, and the local weight obtained by mining attribute distribution characteristics is used to quantify the importance of observed attributes to the pattern classification. Abundant comparative experiments including seven methods on twelve datasets are executed, demonstrating the out-performance of BCC over all baseline methods in terms of accuracy, precision, recall, F1 measure, with pertinent computational costs.

     

  • loading
  • 1 To avoid ambiguity, we apply the term incomplete data for a dataset with missing values, and incomplete pattern for a pattern with missing values.
    2 In evidence theory, the term evidential refers to variables with both uncertainty and imprecision.
    3 This paper focuses on the classification of incomplete data, which means that the reliability weight can be obtained by the proposed optimization strategy quickly based on the training set.
    4 It is used to measure the similarity between two sets of variables. Other correlation coefficients, such as Spearman’s correlation coefficient [55], Kendall’s correlation [56] are also applicable.
    5 All results demonstrated in this paper are average values.6 The differences between the chosen classifiers are beyond this paper.
    6 The differences between the chosen classifiers are beyond this paper.
  • [1]
    K. Y. Chiang, I. S. Dhillon, and C. J. Hsieh, “Using side information to reliably learn low-rank matrices from missing and corrupted observations,” J. Mach. Learn. Res., vol. 19, no. 1, pp. 3005–3039, Jan. 2018.
    [2]
    N. Städler, D. J. Stekhoven, and Bühlmann, “Pattern alternating maximization algorithm for missing data in high-dimensional problems,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1903–1928, Jan. 2014.
    [3]
    B. Fekade, T. Maksymyuk, M. Kyryk, and M. Jo, “Probabilistic recovery of incomplete sensed data in IoT,” IEEE Internet Things J., vol. 5, no. 4, pp. 2282–2292, Aug. 2018. doi: 10.1109/JIOT.2017.2730360
    [4]
    J. Pan, C. B. Li, Y. Tang, W. Li, and X. O. Li, “Energy consumption prediction of a CNC machining process with incomplete data,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 5, pp. 987–1000, May 2021. doi: 10.1109/JAS.2021.1003970
    [5]
    L. Chen, L. Q. Wang, Z. Y. Han, J. Zhao, and W. Wang, “Variational inference based kernel dynamic Bayesian networks for construction of prediction intervals for industrial time series with incomplete input,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 5, pp. 1437–1445, Sep. 2020.
    [6]
    Z. C. Feng, W. He, Z. J. Zhou, X. J. Ban, C. H. Hu, and X. X. Han, “A new safety assessment method based on belief rule base with attribute reliability,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 11, pp. 1774–1785, Nov. 2021. doi: 10.1109/JAS.2020.1003399
    [7]
    R. C. Merton, “A simple model of capital market equilibrium with incomplete information,” J. Finance, vol. 42, no. 3, pp. 483–510, Jul. 1987. doi: 10.1111/j.1540-6261.1987.tb04565.x
    [8]
    J. García-Laencina, J. L. Sancho-Gómez, and A. R. Figueiras-Vidal, “Pattern classification with missing data: A review,” Neural Comput. Appl., vol. 19, no. 2, pp. 263–282, Mar. 2010. doi: 10.1007/s00521-009-0295-6
    [9]
    R. J. A. Little and D. B. Rubin, Statistical Analysis with Missing Data. 3rd ed. New York, USA: Wiley, 2019.
    [10]
    D. Shen, “Iterative learning control with incomplete information: A survey,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 5, pp. 885–901, Sep. 2018. doi: 10.1109/JAS.2018.7511123
    [11]
    D. Bertsimas, C. Pawlowski, and Y. D. Zhuo, “From predictive methods to missing data imputation: An optimization approach,” J. Mach. Learn. Res., vol. 18, no. 1, pp. 7133–7171, Jan. 2017.
    [12]
    D. Williams, X. J. Liao, Y. Xue, L. Carin, and B. Krishnapuram, “On classification with incomplete data,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 3, pp. 427–436, Mar. 2007. doi: 10.1109/TPAMI.2007.52
    [13]
    Z. Ghahramani and M. I. Jordan, “Supervised learning from incomplete data via an EM approach,” in Proc. 6th Int. Conf. Neural Information Processing Systems, Denver, USA, 1993, pp. 120−127.
    [14]
    J. R. Quinlan, C4.5: Programs for Machine Learning. Amsterdam, the Netherlands: Elsevier, 2014.
    [15]
    M. Kuhn and K. Johnson, Applied Predictive Modeling. New York, USA: Springer, 2013.
    [16]
    G. De'ath and K. E. Fabricius, “Classification and regression trees: A powerful yet simple technique for ecological data analysis,” Ecology, vol. 81, no. 11, pp. 3178–3192, Nov. 2000. doi: 10.1890/0012-9658(2000)081[3178:CARTAP]2.0.CO;2
    [17]
    Patidar and A. Tiwari, “Handling missing value in decision tree algorithm,” Int. J. Comput. Appl., vol. 70, no. 13, pp. 31–36, May 2013.
    [18]
    C. Lim, J. H. Leong, and M. M. Kuan, “A hybrid neural network system for pattern classification tasks with missing features,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 4, pp. 648–653, Apr. 2005. doi: 10.1109/TPAMI.2005.64
    [19]
    K. Pelckmans, J. De Brabanter, J. A. K. Suykens, and B. De Moor, “Handling missing values in support vector machine classifiers,” Neural Netw., vol. 18, no. 5-6, pp. 684–692, Jul-Aug. 2005. doi: 10.1016/j.neunet.2005.06.025
    [20]
    T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in Proc. 1st Int. Conf. Learning Representations, Scottsdale, USA, 2013.
    [21]
    S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. 3rd ed. Essex, UK: Pearson Education Limited, 2016.
    [22]
    L. S. Chan and O. J. Dunn, “The treatment of missing values in discriminant analysis-I. The sampling experiment,” J. Am. Stat. Assoc., vol. 67, no. 338, pp. 473–477, Jun. 1972.
    [23]
    L. Brás and J. C. Menezes, “Improving cluster-based missing value estimation of DNA microarray data,” Biomol. Eng., vol. 24, no. 2, pp. 273–282, Jun. 2007. doi: 10.1016/j.bioeng.2007.04.003
    [24]
    J. Luengo, J. A. Sáez, and F. Herrera, “Missing data imputation for fuzzy rule-based classification systems,” Soft Comput., vol. 16, no. 5, pp. 863–881, May 2012. doi: 10.1007/s00500-011-0774-4
    [25]
    S. G. Liu, J. Zhang, Y. Xiang, and W. L. Zhou, “Fuzzy-based information decomposition for incomplete and imbalanced data learning,” IEEE Trans. Fuzzy Syst., vol. 25, no. 6, pp. 1476–1490, Dec. 2017. doi: 10.1109/TFUZZ.2017.2754998
    [26]
    B. Muzellec, J. Josse, C. Boyer, and M. Cuturi, “Missing data imputation using optimal transport,” in Proc. 37th Int. Conf. Machine Learning, 2020, pp. 7130−7140.
    [27]
    D. B. Rubin, Multiple Imputation for Nonresponse in Surveys. New York, USA: John Wiley & Sons Inc., 2004.
    [28]
    S. van Buuren and K. Groothuis-Oudshoorn, “mice: Multivariate imputation by chained equations in R,” J. Stat. Softw., vol. 45, no. 3, pp. 1–67, Dec. 2011.
    [29]
    J. Yoon, J. Jordon, and M. Schaar, “GAIN: Missing data imputation using generative adversarial nets,” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 5689−5698.
    [30]
    D. J. Stekhoven and Bühlmann, “MissForest-non-parametric missing value imputation for mixed-type data,” Bioinformatics, vol. 28, no. 1, pp. 112–118, Jan. 2012. doi: 10.1093/bioinformatics/btr597
    [31]
    Z. G. Liu, Q. Pan, G. Mercier, and J. Dezert, “A new incomplete pattern classification method based on evidential reasoning,” IEEE Trans. Cybern., vol. 45, no. 4, pp. 635–646, Apr. 2014.
    [32]
    I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672−2680.
    [33]
    A. Dempster, “Upper and lower probabilities induced by a multivalued mapping,” Ann. Math. Stat., vol. 38, no. 2, pp. 325–339, Apr. 1967. doi: 10.1214/aoms/1177698950
    [34]
    G. Shafer, A Mathematical Theory of Evidence. Princeton, USA: Princeton University Press, 1976.
    [35]
    B. Quost, M. H. Masson, and T. Denoeux, “Classifier fusion in the dempster-Shafer framework using optimized t-norm based combination rules,” Int. J. Approx. Reason., vol. 52, no. 3, pp. 353–374, Mar. 2011. doi: 10.1016/j.ijar.2010.11.008
    [36]
    D. Mercier, G. Cron, T. Denoeux, and M. H. Masson, “Decision fusion for postal address recognition using belief functions,” Expert Syst. Appl., vol. 36, no. 3, pp. 5643–5653, Apr. 2009. doi: 10.1016/j.eswa.2008.06.058
    [37]
    F. Y. Xiao, “A new divergence measure for belief functions in D-S evidence theory for multisensor data fusion,” Inf. Sci., vol. 514, pp. 462–483, Apr. 2020. doi: 10.1016/j.ins.2019.11.022
    [38]
    Z. G. Liu, Q. Pan, J. Dezert, J. W. Han, and Y. He, “Classifier fusion with contextual reliability evaluation,” IEEE Trans. Cybern., vol. 48, no. 5, pp. 1605–1618, May 2018. doi: 10.1109/TCYB.2017.2710205
    [39]
    Smets, “Decision making in the TBM: The necessity of the pignistic transformation,” Int. J. Approx. Reason., vol. 38, no. 2, pp. 133–147, Feb. 2005. doi: 10.1016/j.ijar.2004.05.003
    [40]
    Y. Leung, N. N. Ji, and J. H. Ma, “An integrated information fusion approach based on the theory of evidence and group decision-making,” Inf. Fusion, vol. 14, no. 4, pp. 410–422, Oct. 2013. doi: 10.1016/j.inffus.2012.08.002
    [41]
    Z. W. Zhang, H. P. Tian, L. Z. Yan, A. Martin, and K. Zhou, “Learning a credal classifier with optimized and adaptive multiestimation for missing data imputation,” IEEE Trans. Syst., Man, Cybern.: Syst., 2021. doi: 10.1109/TSMC.2021.3090210.
    [42]
    A. Martin and E. Radoi, “Effective ATR algorithms using information fusion models,” in Proc. 7th Int. Conf. Information Fusion, Stockholm, Sweden, 2004, pp. 161−166.
    [43]
    T. Denoeux, “Decision-making with belief functions: A review,” Int. J. Approx. Reason., vol. 109, pp. 87–110, Jun. 2019. doi: 10.1016/j.ijar.2019.03.009
    [44]
    M. H. Masson and T. Denoeux, “ECM: An evidential version of the fuzzy c-means algorithm,” Patt. Recognit., vol. 41, no. 4, pp. 1384–1397, Apr. 2008. doi: 10.1016/j.patcog.2007.08.014
    [45]
    Z. G. Su and T. Denoeux, “BPEC: Belief-peaks evidential clustering,” IEEE Trans. Fuzzy Syst., vol. 27, no. 1, pp. 111–123, Jan. 2019. doi: 10.1109/TFUZZ.2018.2869125
    [46]
    F. J. Li, Y. H. Qian, J. T. Wang, and J. Y. Liang, “Multigranulation information fusion: A Dempster-Shafer evidence theory-based clustering ensemble method,” Inf. Sci., vol. 378, pp. 389–409, Feb. 2017. doi: 10.1016/j.ins.2016.10.008
    [47]
    T. Denoeux, “A k-nearest neighbor classification rule based on Dempster-Shafer theory,” IEEE Trans. Syst.,Man,Cybern., vol. 25, no. 5, pp. 804–813, May 1995. doi: 10.1109/21.376493
    [48]
    T. Denoeux, “Logistic regression, neural networks and Dempster-Shafer theory: A new perspective,” Knowl.-Based Syst., vol. 176, pp. 54–67, Jul. 2019. doi: 10.1016/j.knosys.2019.03.030
    [49]
    J. Zhao, R. Xue, Z. N. Dong, D. Y. Tang, and W. H. Wei, “Evaluating the reliability of sources of evidence with a two-perspective approach in classification problems based on evidence theory,” Inf. Sci., vol. 507, pp. 313–338, Jan. 2020. doi: 10.1016/j.ins.2019.08.033
    [50]
    Z. F. Ma, H. Tian, Z. C. Liu, and Z. W. Zhang, “A new incomplete pattern belief classification method with multiple estimations based on KNN,” Appl. Soft Comput., vol. 90, Article No. 106175, May 2020. doi: 10.1016/j.asoc.2020.106175
    [51]
    M. L. Seltzer, B. Raj, and R. M. Stern, “A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition,” Speech Commun., vol. 43, no. 4, pp. 379–393, Sep. 2004. doi: 10.1016/j.specom.2004.03.006
    [52]
    S. J. Choudhury and N. R. Pal, “Imputation of missing data with neural networks for classification,” Knowl.-Based Syst., vol. 182, Article No. 104838, Oct. 2019. doi: 10.1016/j.knosys.2019.07.009
    [53]
    M. J. D. Powell, “A fast algorithm for nonlinearly constrained optimization calculations,” in Numerical Analysis, G. A. Watson, Ed. Berlin, Heidelberg, Germany: Springer, 1978, pp. 144−157.
    [54]
    J. Benesty, J. D. Chen, Y. T. Huang, and I. Cohen, “Pearson correlation coefficient,” in Noise Reduction in Speech Processing, I. Cohen, Y. T. Huang, J. D. Chen, and J. Benesty, Eds. Heidelberg, Germany: Springer, 2009, pp. 1−4.
    [55]
    “Spearman rank correlation coefficient,” in The Concise Encyclopedia of Statistics, New York, USA: Springer, 2008, pp. 502−505.
    [56]
    M. G. Kendall, Rank Correlation Methods. London: Griffin, 1948.
    [57]
    Smets, “The combination of evidence in the transferable belief model,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 12, no. 5, pp. 447–458, May 1990. doi: 10.1109/34.55104
    [58]
    R. R. Yager, “On the Dempster-Shafer framework and new combination rules,” Inf. Sci., vol. 41, no. 2, pp. 93–137, Mar. 1987. doi: 10.1016/0020-0255(87)90007-7
    [59]
    D. Dubois and H. Prade, “Representation and combination of uncertainty with belief functions and possibility measures,” Comput. Intell., vol. 4, no. 3, pp. 244–264, Sep. 1988. doi: 10.1111/j.1467-8640.1988.tb00279.x
    [60]
    J. Dezert and A. Tchamova, “On the validity of dempster's fusion rule and its interpretation as a generalization of Bayesian fusion rule,” Int. J. Intell. Syst., vol. 29, no. 3, pp. 223–252, Mar. 2014. doi: 10.1002/int.21638
    [61]
    Y. M. Yang, “An evaluation of statistical approaches to text categorization,” Inf. Retrieval, vol. 1, no. 1–2, pp. 69–90, Apr. 1999.
    [62]
    T. Cover and Hart, “Nearest neighbor pattern classification,” IEEE Trans. Inf. Theory, vol. 13, no. 1, pp. 21–27, Jan. 1967. doi: 10.1109/TIT.1967.1053964
    [63]
    J. Twisk, M. de Boer, W. de Vente, and M. Heymans, “Multiple imputation of missing values was not necessary before performing a longitudinal mixed-model analysis,” J. Clin. Epidemiol., vol. 66, no. 9, pp. 1022–1028, Sep. 2013. doi: 10.1016/j.jclinepi.2013.03.017

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(3)  / Tables(8)

    Article Metrics

    Article views (507) PDF downloads(92) Cited by()

    Highlights

    • We propose a method to handle missing values in both training and test patterns.
    • The method can characterize uncertainty and imprecision brought by incompleteness.
    • The method is considered as a strategy of multiple classifiers fusion.
    • We consider each attribute as a sub-source to provide complementary information.
    • We provide a method to optimize the weight of each sub-source.

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return