A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 10 Issue 3
Mar.  2023

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
D. Yu, M. Y. Zhang, M. T. Li, F. S. Zha, J. G. Zhang, L. N. Sun, and K. Q. Huang, “Squeezing more past knowledge for online class-incremental continual learning,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 722–736, Mar. 2023. doi: 10.1109/JAS.2023.123090
Citation: D. Yu, M. Y. Zhang, M. T. Li, F. S. Zha, J. G. Zhang, L. N. Sun, and K. Q. Huang, “Squeezing more past knowledge for online class-incremental continual learning,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 722–736, Mar. 2023. doi: 10.1109/JAS.2023.123090

Squeezing More Past Knowledge for Online Class-Incremental Continual Learning

doi: 10.1109/JAS.2023.123090
Funds:  This work was supported in part by the National Natural Science Foundation of China (U2013602, 61876181, 51521003), the National Key R&D Program of China (2020YFB13134), Shenzhen Science and Technology Research and Development Foundation (JCYJ20190813171009236), Beijing Nova Program of Science and Technology (Z191100001119043), and the Youth Innovation Promotion Association, Chinese Academy of Sciences
More Information
  • Continual learning (CL) studies the problem of learning to accumulate knowledge over time from a stream of data. A crucial challenge is that neural networks suffer from performance degradation on previously seen data, known as catastrophic forgetting, due to allowing parameter sharing. In this work, we consider a more practical online class-incremental CL setting, where the model learns new samples in an online manner and may continuously experience new classes. Moreover, prior knowledge is unavailable during training and evaluation. Existing works usually explore sample usages from a single dimension, which ignores a lot of valuable supervisory information. To better tackle the setting, we propose a novel replay-based CL method, which leverages multi-level representations produced by the intermediate process of training samples for replay and strengthens supervision to consolidate previous knowledge. Specifically, besides the previous raw samples, we store the corresponding logits and features in the memory. Furthermore, to imitate the prediction of the past model, we construct extra constraints by leveraging multi-level information stored in the memory. With the same number of samples for replay, our method can use more past knowledge to prevent interference. We conduct extensive evaluations on several popular CL datasets, and experiments show that our method consistently outperforms state-of-the-art methods with various sizes of episodic memory. We further provide a detailed analysis of these results and demonstrate that our method is more viable in practical scenarios.

     

  • loading
  • [1]
    P. Sun, R. Zhang, Y. Jiang, T. Kong, C. Xu, W. Zhan, M. Tomizuka, L. Li, Z. Yuan, C. Wang, and P. Luo, “Sparse R-CNN: End-to-end object detection with learnable proposals,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2021, pp. 14454–14463.
    [2]
    P. Dai, R. Weng, W. Choi, C. Zhang, Z. He, and W. Ding, “Learning a proposal classifier for multiple object tracking,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2021, pp. 2443–2452.
    [3]
    M. Fan, S. Lai, J. Huang, X. Wei, Z. Chai, J. Luo, and X. Wei, “Rethinking bisenet for real-time semantic segmentation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2021, pp. 9716–9725.
    [4]
    L. Liu, D. Dugas, G. Cesari, R. Siegwart, and R. Dubé, “Robot navigation in crowded environments using deep reinforcement learning,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, 2020, pp. 5671–5677.
    [5]
    R. Agarwal, M. Schwarzer, P. S. Castro, A. C. Courville, and M. Bellemare, “Deep reinforcement learning at the edge of the statistical precipice,” Advances Neural Information Processing Syst., vol. 34, pp. 29304–29320, 2021.
    [6]
    M. McCloskey and N. J. Cohen, “Catastrophic interference in connectionist networks: The sequential learning problem,” in Proc. Psychology Learning Motivation, Elsevier, 1989, vol. 24, pp. 109–165.
    [7]
    R. M. French, “Catastrophic forgetting in connectionist networks,” Trends in Cognitive Sciences, vol. 3, no. 4, pp. 128–135, 1999. doi: 10.1016/S1364-6613(99)01294-2
    [8]
    I. J. Goodfellow, M. Mirza, D. Xiao, A. Courville, and Y. Bengio, “An empirical investigation of catastrophic forgeting in gradientbased neural networks,” arXiv preprint arXiv: 1312.6211, 2013.
    [9]
    M. B. Ring, “Continual learning in reinforcement environments,” Ph.D. dissertation, University of Texas at Austin, Austin, USA, 1994.
    [10]
    M. Zhai, L. Chen, and G. Mori, “Hyper-lifelonggan: Scalable lifelong learning for image conditioned generation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2021, pp. 2246–2255.
    [11]
    R. Aljundi, M. Rohrbach, and T. Tuytelaars, “Selfless sequential learning,” in Proc. Int. Conf. Learning Representations, 2019.
    [12]
    M. PourKeshavarzi, G. Zhao, and M. Sabokrou, “Looking back on learned experiences for class/task incremental learning,” in Proc. Int. Conf. Learning Representations, 2022.
    [13]
    T.-Y. Wu, G. Swaminathan, Z. Li, A. Ravichandran, N. Vasconcelos, R. Bhotika, and S. Soatto, “Class-incremental learning with strong pretrained models,” arXiv preprint arXiv: 2204.03634, 2022.
    [14]
    D. Shim, Z. Mai, J. Jeong, S. Sanner, H. Kim, and J. Jang, “Online class-incremental continual learning with adversarial shapley value,” in Proc. AAAI Conf. Artificial Intelligence, 2021, vol. 35, no. 11, pp. 9630–9638.
    [15]
    Y. Gu, X. Yang, K. Wei, and C. Deng, “Not just selection, but exploration: Online class-incremental continual learning via dual view consistency,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2022, pp. 7442–7451.
    [16]
    M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 44, no. 7, pp. 3366–3385, 2021.
    [17]
    M. F. Carr, S. Jadhav, and L. M. Frank, “Hippocampal replay in the awake state: A potential substrate for memory consolidation and retrieval,” Nature Neuroscience, vol. 14, no. 2, p. 147, 2011. doi: 10.1038/nn.2732
    [18]
    J. L. McClelland, “Complementary learning systems in the brain. A connectionist approach to explicit and implicit cognition and memory.” Annals New York Academy Sciences, vol. 843, p. 153, 1998.
    [19]
    J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al, “Overcoming catastrophic forgetting in neural networks,” Proc. National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017. doi: 10.1073/pnas.1611835114
    [20]
    F. Zenke, B. Poole, and S. Ganguli, “Continual learning through synaptic intelligence,” Proc. Machine Learning Research, vol. 70, p. 3987, 2017.
    [21]
    J. Schwarz, W. Czarnecki, J. Luketina, A. Grabska-Barwinska, Y. W. Teh, R. Pascanu, and R. Hadsell, “Progress & compress: A scalable framework for continual learning,” in Proc. Int. Conf. Machine Learning, 2018, pp. 4528–4537.
    [22]
    R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, “Memory aware synapses: Learning what (not) to forget,” in Proc. European Conf. Computer Vision, 2018, pp. 139–154.
    [23]
    C. V. Nguyen, Y. Li, T. D. Bui, and R. E. Turner, “Variational continual learning,” arXiv preprint arXiv: 1710.10628, 2017.
    [24]
    H. Ahn, S. Cha, D. Lee, and T. Moon, “Uncertainty-based continual learning with adaptive regularization,” Advances Neural Information Processing Systems, vol. 32, 2019.
    [25]
    K. Wei, C. Deng, X. Yang, and M. Li, “Incremental embedding learning via zero-shot translation,” in Proc. AAAI Conf. Artificial Intelligence, 2021, vol. 35, no. 11, pp. 10254–10262.
    [26]
    Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Transactions Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 2935–2947, 2017.
    [27]
    A. Rannen, R. Aljundi, M. B. Blaschko, and T. Tuytelaars, “Encoder based lifelong learning,” in Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 1320–1328.
    [28]
    Y. Gu, C. Deng, and K. Wei, “Class-incremental instance segmentation via multi-teacher networks,” in Proc. AAAI Conf. Artificial Intelligence, 2021, vol. 35, no. 2, pp. 1478–1486.
    [29]
    G. Yang, E. Fini, D. Xu, P. Rota, M. Ding, M. Nabi, X. Alameda-Pineda, and E. Ricci, “Uncertainty-aware contrastive distillation for incremental semantic segmentation,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 45, pp. 2567–2581, 2022.
    [30]
    E. Fini, V. G. T. da Costa, X. Alameda-Pineda, E. Ricci, K. Alahari, and J. Mairal, “Self-supervised models are continual learners,” arXiv preprint arXiv: 2112.04215, 2021.
    [31]
    A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive neural networks,” arXiv preprint arXiv: 1606.04671, 2016.
    [32]
    D. Liu, J. Xu, Zhang, and Y. Yan, “Investigation of knowledge transfer approaches to improve the acoustic modeling of vietnamese ASR system,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 5, pp. 1187–1195, 2019. doi: 10.1109/JAS.2019.1911693
    [33]
    R. Aljundi, P. Chakravarty, and T. Tuytelaars, “Expert gate: Lifelong learning with a network of experts,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017, pp. 3366–3375.
    [34]
    J. Yoon, E. Yang, J. Lee, and S. J. Hwang, “Lifelong learning with dynamically expandable networks,” arXiv preprint arXiv:1708.01547, 2017.
    [35]
    A. Douillard, A. Ramé, G. Couairon, and M. Cord, “DYTOX: Transformers for continual learning with dynamic token expansion,” arXiv preprint arXiv: 2111.11326, 2021.
    [36]
    A. Mallya and S. Lazebnik, “PackNet: Adding multiple tasks to a single network by iterative pruning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 7765–7773.
    [37]
    C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A. A. Rusu, A. Pritzel, and D. Wierstra, “Pathnet: Evolution channels gradient descent in super neural networks,” arXiv preprint arXiv: 1701.08734, 2017.
    [38]
    J. Serra, D. Suris, M. Miron, and A. Karatzoglou, “Overcoming catastrophic forgetting with hard attention to the task,” in Proc. Int. Conf. Machine Learning, 2018, pp. 4548–4557.
    [39]
    A. Benjamin, D. Rolnick, and K. Kording, “Measuring and regularizing networks in function space,” in Proc. Int. Conf. Learning Representations, 2019.
    [40]
    A. Prabhu, P. H. Torr, and P. K. Dokania, “Gdumb: A simple approach that questions our progress in continual learning,” in Proc. European Conf. Computer Vision, Springer, 2020, pp. 524–540.
    [41]
    A. Chaudhry, A. Gordo, P. Dokania, P. Torr, and D. Lopez-Paz, “Using hindsight to anchor past knowledge in continual learning,” in Proc. AAAI Conf. Artificial Intelligence, 2021, vol. 35, no. 8, pp. 6993–7001.
    [42]
    J. Bang, H. Kim, Y. Yoo, J.-W. Ha, and J. Choi, “Rainbow memory: Continual learning with a memory of diverse samples,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, June 2021, pp. 8218–8227.
    [43]
    X. Jin, A. Sadhu, J. Du, and X. Ren, “Gradient-based editing of memory examples for online task-free continual learning,” Advances Neural Information Processing Syst., vol. 34, pp. 29193–29205, 2021.
    [44]
    L. Wang, X. Zhang, K. Yang, L. Yu, C. Li, H. Lanqing, S. Zhang, Z. Li, Y. Zhong, and J. Zhu, “Memory replay with data compression for continual learning,” arXiv preprint arXiv: 2202.06592, 2022.
    [45]
    J. Yoon, D. Madaan, E. Yang, and S. J. Hwang, “Online coreset selection for rehearsal-based continual learning,” in Proc. Int. Conf. Learning Representations, 2022.
    [46]
    S. Sun, D. Calandriello, H. Hu, A. Li, and M. Titsias, “Information theoretic online memory selection for continual learning,” arXiv preprint arXiv: 2204.04763, 2022.
    [47]
    S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “ICARL: Incremental classifier and representation learning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017, pp. 2001–2010.
    [48]
    Buzzega, M. Boschini, A. Porrello, D. Abati, and S. Calderara, “Dark experience for general continual learning: A strong, simple baseline,” Advances Neural Information Processing Systems, vol. 33, pp. 15920–15930, 2020.
    [49]
    H. Shin, J. K. Lee, J. Kim, and J. Kim, “Continual learning with deep generative replay,” in Proc. Advances Neural Information Processing Systems, 2017, pp. 2990–2999.
    [50]
    M. Riemer, T. Klinger, D. Bouneffouf, and M. Franceschini, “Scalable recollections for continual lifelong learning,” in Proc. AAAI Conf. Artificial Intelligence, 2019, vol. 33, pp. 1352–1359.
    [51]
    G. M. van de Ven, H. T. Siegelmann, and A. S. Tolias, “Brain-inspired replay for continual learning with artificial neural networks,” Nature Communi., vol. 11, no. 1, pp. 1–14, 2020. doi: 10.1038/s41467-019-13993-7
    [52]
    K. Wei, C. Deng, X. Yang, and D. Tao, “Incremental zero-shot learning,” IEEE Trans. Cybernetics, vol. 52, no. 12, pp. 13788–13799, 2021.
    [53]
    Y.-M. Tang, Y.-X. Peng, and W.-S. Zheng, “Learning to imagine: Diversify memory for incremental learning using unlabeled data,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2022, pp. 9549–9558.
    [54]
    D. Lopez-Paz and M. Ranzato, “Gradient episodic memory for continual learning,” in Proc. Advances Neural Information Processing Systems, 2017, pp. 6467–6476.
    [55]
    A. Chaudhry, M. Ranzato, M. Rohrbach, and M. Elhoseiny, “Efficient lifelong learning with A-GEM,” arXiv preprint arXiv: 1812.00420, 2018.
    [56]
    R. Aljundi, M. Lin, B. Goujaud, and Y. Bengio, “Gradient based sample selection for online continual learning,” in Proc. Advances Neural Information Processing Systems, 2019, pp. 11816–11825.
    [57]
    M. Farajtabar, N. Azizan, A. Mott, and A. Li, “Orthogonal gradient descent for continual learning,” in Proc. Int. Conf. Artificial Intelligence Statistics, 2020, pp. 3762–3773.
    [58]
    E. F. Ohata, G. M. Bezerra, J. V. S. das Chagas, A. V. L. Neto, A. B. Albuquerque, V. H. C. de Albuquerque, and P. Reboucas Filho, “Automatic detection of COVID-19 infection using chest X-ray images through transfer learning,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 1, pp. 239–248, 2020.
    [59]
    A. Muzahid, W. Wan, F. Sohel, L. Wu, and L. Hou, “CurveNet: Curvature-based multitask learning deep networks for 3D object recognition,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 6, pp. 1177–1187, 2020.
    [60]
    G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv: 1503.02531, 2015.
    [61]
    J. H. Cho and B. Hariharan, “On the efficacy of knowledge distillation,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Oct. 2019, pp. 4794–4802.
    [62]
    Y. Bengio, A. Courville, and Vincent, “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013. doi: 10.1109/TPAMI.2013.50
    [63]
    A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, “Fitnets: Hints for thin deep nets,” arXiv preprint arXiv: 1412.6550, 2014.
    [64]
    B. Heo, J. Kim, S. Yun, H. Park, N. Kwak, and J. Y. Choi, “A comprehensive overhaul of feature distillation,” in Proc. IEEE/CVF Int. Conf. Computer Vision, 2019, pp. 1921–1930.
    [65]
    J. Kim, S. Park, and N. Kwak, “Paraphrasing complex network: Network compression via factor transfer,” arXiv preprint arXiv: 1802.04977, 2018.
    [66]
    F. Tung and G. Mori, “Similarity-preserving knowledge distillation,” in Proc. IEEE/CVF Int. Conf. Computer Vision, 2019, pp. 1365–1374.
    [67]
    Y. LeCun, L. Bottou, Y. Bengio, and Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. doi: 10.1109/5.726791
    [68]
    A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” M.S. thesis, University of Toronto, Toronto, Canada, 2009.
    [69]
    O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al., “Matching networks for one shot learning,” Advances Neural Information Processing Systems, vol. 29, pp. 3630–3638, 2016.
    [70]
    Y. Le and X. Yang, “Tiny imagenet visual recognition challenge,” CS 231N, vol. 7, no. 7, p. 3, 2015.
    [71]
    A. Chaudhry, M. Rohrbach, M. Elhoseiny, T. Ajanthan, P. K. Dokania, P. H. Torr, and M. Ranzato, “On tiny episodic memories in continual learning,” arXiv preprint arXiv: 1902.10486, 2019.
    [72]
    W. Park, D. Kim, Y. Lu, and M. Cho, “Relational knowledge distillation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2019, pp. 3967–3976.
    [73]
    A. Cheraghian, S. Rahman, P. Fang, S. K. Roy, L. Petersson, and M. Harandi, “Semantic-aware knowledge distillation for few-shot classincremental learning,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2021, pp. 2534–2543.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)  / Tables(7)

    Article Metrics

    Article views (709) PDF downloads(75) Cited by()

    Highlights

    • We propose a hierarchical replay-based method for online class-incremental CL
    • Besides the previous raw samples, we store the corresponding logits and features and construct multi-level constraints to imitate the predictions of the past model
    • Replaying the same number of samples, our method achieves state-of-the-art performance

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return