IEEE/CAA Journal of Automatica Sinica
Citation:  X. H. Wang, S. S. Zhao, L. Guo, L. Zhu, C. R. Cui, and L. C. Xu, “GraphCA: Learning from graph counterfactual augmentation for knowledge tracing,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 11, pp. 2108–2123, Nov. 2023. doi: 10.1109/JAS.2023.123678 
With the popularity of online learning in educational settings, knowledge tracing (KT) plays an increasingly significant role. The task of KT is to help students learn more effectively by predicting their next mastery of knowledge based on their historical exercise sequences. Nowadays, many related works have emerged in this field, such as Bayesian knowledge tracing and deep knowledge tracing methods. Despite the progress that has been made in KT, existing techniques still have the following limitations: 1) Previous studies address KT by only exploring the observational sparsity data distribution, and the counterfactual data distribution has been largely ignored. 2) Current works designed for KT only consider either the entity relationships between questions and concepts, or the relations between two concepts, and none of them investigates the relations among students, questions, and concepts, simultaneously, leading to inaccurate student modeling. To address the above limitations, we propose a graph counterfactual augmentation method for knowledge tracing. Concretely, to consider the multiple relationships among different entities, we first uniform students, questions, and concepts in graphs, and then leverage a heterogeneous graph convolutional network to conduct representation learning. To model the counterfactual world, we conduct counterfactual transformations on students’ learning graphs by changing the corresponding treatments and then exploit the counterfactual outcomes in a contrastive learning framework. We conduct extensive experiments on three realworld datasets, and the experimental results demonstrate the superiority of our proposed GraphCA method compared with several stateoftheart baselines.
[1] 
X. Wang, W. Ma, L. Guo, H. Jiang, F. Liu, and C. Xu, “HGNN: Hyperedgebased graph neural network for MOOC course recommendation,” Information Processing &Management, vol. 59, no. 3, p. 102938, 2022.

[2] 
C. Cui, J. Zong, Y. Ma, X. Wang, L. Guo, M. Chen, and Y. Yin, “Tribranch convolutional neural networks for topk focused academic performance prediction,” IEEE Trans. Neural Networks and Learning Systems, 2022. DOI: 10.1109/TNNLS.2022.3175068

[3] 
S. Pandey and G. Karypis, “A selfattentive model for knowledge tracing,” arXiv preprint arXiv: 1907.06837, 2019.

[4] 
T. Wang, F. Ma, and J. Gao, “Deep hierarchical knowledge tracing,” in Proc. 12th Int. Conf. Educational Data Mining, 2019, pp. 671–674.

[5] 
Y. Yang, J. Shen, Y. Qu, Y. Liu, K. Wang, Y. Zhu, W. Zhang, and Y. Yu, “GIKT: A graphbased interaction model for knowledge tracing,” in Proc. Joint European Conf. Machine Learning and Knowledge Discovery in Databases, 2020, pp. 299–315.

[6] 
P. Chen, Y. Lu, V. W. Zheng, and Y. Pian, “Prerequisitedriven deep knowledge tracing,” in Proc. IEEE Int. Conf. Data Mining, 2018, pp. 39–48.

[7] 
S. Zhang, D. Yao, Z. Zhao, T.S. Chua, and F. Wu, “CauseRec: Counterfactual user sequence synthesis for sequential recommendation,” in Proc. 44th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, 2021, pp. 367–377.

[8] 
L. V. DiBello, L. A. Roussos, and W. Stout, “31A review of cognitively diagnostic assessment and a summary of psychometric models,” Handbook of Statistics, vol. 26, pp. 979–1030, 2006.

[9] 
R. J. Harvey and A. L. Hammer, “Item response theory,” The Counseling Psychologist, vol. 27, no. 3, pp. 353–383, 1999. doi: 10.1177/0011000099273004

[10] 
B. Deonovic, M. Yudelson, M. Bolsinova, M. Attali, and G. Maris, “Learning meets assessment,” Behaviormetrika, vol. 45, no. 2, pp. 457–474, 2018. doi: 10.1007/s412370180070z

[11] 
J. De La Torre, “Dina model and parameter estimation: A didactic,” J. Educational Behavioral Statistics, vol. 34, no. 1, pp. 115–130, 2009. doi: 10.3102/1076998607309474

[12] 
A. T. Corbett and J. R. Anderson, “Knowledge tracing: Modeling the acquisition of procedural knowledge,” User Modeling and UserAdapted Interaction, vol. 4, no. 4, pp. 253–278, 1994.

[13] 
R. S. d Baker, A. T. Corbett, and V. Aleven, “More accurate student modeling through contextual estimation of slip and guess probabilities in bayesian knowledge tracing,” in Proc. Int. Conf. Intelligent Tutoring Systems, 2008, pp. 406–415.

[14] 
S. Spaulding and C. Breazeal, “Affect and inference in bayesian knowledge tracing with a robot tutor,” in Proc. 10th Annu. ACM/IEEE Int. Conf. HumanRobot Interaction Extended Abstracts, 2015, pp. 219–220.

[15] 
Z. A. Pardos, N. Heffernan, C. Ruiz, and J. Beck, “Effective skill assessment using expectation maximization in a multi network temporal bayesian network,” in Proc. Young Researchers Track at 9th Int. Conf. Intelligent Tutoring Systems, 2008, pp. 1–14.

[16] 
L. Guo, H. Yin, Q. Wang, T. Chen, A. Zhou, and N. Quoc Viet Hung, “Streaming sessionbased recommendation,” in Proc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, 2019, pp. 1569–1577.

[17] 
L. Guo, J. Zhang, T. Chen, X. Wang, and H. Yin, “Reinforcement learningenhanced sharedaccount crossdomain sequential recommendation,” IEEE Trans. Knowledge and Data Engineering, vol. 35, no. 7, pp. 7397–7411, Jul. 2023. doi: 10.1109/TKDE.2022.3185101

[18] 
C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, L. J. Guibas, and J. SohlDickstein, “Deep knowledge tracing,” Advances in Neural Information Processing Syst., vol. 1, pp. 505–513, 2015. doi: 10.5555/2969239.2969296

[19] 
J. Zhang, X. Shi, I. King, and D.Y. Yeung, “Dynamic keyvalue memory networks for knowledge tracing,” in Proc. 26th Int. Conf. World Wide Web, 2017, pp. 765–774.

[20] 
K. Nagatani, Q. Zhang, M. Sato, Y.Y. Chen, F. Chen, and T. Ohkuma, “Augmenting knowledge tracing by considering forgetting behavior,” in Proc. World Wide Web Conf., 2019, pp. 3101–3107.

[21] 
G. Abdelrahman and Q. Wang, “Knowledge tracing with sequential keyvalue memory networks,” in Proc. 42nd Int. ACM SIGIR Conf. Research and Development in Information Retrieval, 2019, pp. 175–184.

[22] 
Q. Liu, Z. Huang, Y. Yin, E. Chen, H. Xiong, Y. Su, and G. Hu, “EKT: Exerciseaware knowledge tracing for student performance prediction,” IEEE Trans. Knowledge and Data Engineering, vol. 33, no. 1, pp. 100–115, 2019.

[23] 
L. Guo, L. Tang, T. Chen, L. Zhu, Q. V. H. Nguyen, and H. Yin, “DAGCN: A domainaware attentive graph convolution network for sharedaccount crossdomain sequential recommendation,” arXiv preprint arXiv: 2105.03300, 2021.

[24] 
L. Guo, H. Yin, T. Chen, X. Zhang, and K. Zheng, “Hierarchical hyperedge embeddingbased representation learning for group recommendation,” ACM Trans. Information Systems, vol. 40, no. 1, pp. 1–27, 2021.

[25] 
X. Wang, L. Jia, L. Guo, and F. Liu, “Multiaspect heterogeneous information network for MOOC knowledge concept recommendation,” Applied Intelligence, vol. 53, pp. 11951–11965, 2023. doi: 10.1007/s1048902204025x

[26] 
X. Hong, T. Zhang, Z. Cui, and J. Yang, “Variational gridded graph convolution network for node classification,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 10, pp. 1697–1708, 2021. doi: 10.1109/JAS.2021.1004201

[27] 
X. Wang, X. He, Y. Cao, M. Liu, and T.S. Chua, “KGAT: Knowledge graph attention network for recommendation,” in Proc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, 2019, pp. 950–958.

[28] 
X. Qi, R. Liao, J. Jia, S. Fidler, and R. Urtasun, “3D graph neural networks for RGBD semantic segmentation,” in Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 5199–5208.

[29] 
L. Yao, C. Mao, and Y. Luo, “Graph convolutional networks for text classification,” in Proc. AAAI Conf. Artificial Intelligence, 2019, vol. 33, no. 1, pp. 7370–7377.

[30] 
Y. Liu, Y. Yang, X. Chen, J. Shen, H. Zhang, and Y. Yu, “Improving knowledge tracing via pretraining question embeddings,” arXiv preprint arXiv: 2012.05031, 2020.

[31] 
H. Nakagawa, Y. Iwasawa, and Y. Matsuo, “Graphbased knowledge tracing: Modeling student proficiency using graph neural network,” in Proc. IEEE/WIC/ACM Int. Conf. Web Intelligence, 2019, pp. 156–163.

[32] 
S. Tong, Q. Liu, W. Huang, Z. Huang, E. Chen, C. Liu, H. Ma, and S. Wang, “Structurebased knowledge tracing: An influence propagation view,” in Proc. IEEE Int. Conf. Data Mining, 2020, pp. 541–550.

[33] 
K. Clark, M.T. Luong, Q. V. Le, and C. D. Manning, “ELECTRA: Pretraining text encoders as discriminators rather than generators,” arXiv preprint arXiv: 2003.10555, 2020.

[34] 
T. Gao, X. Yao, and D. Chen, “SimCSE: Simple contrastive learning of sentence embeddings,” arXiv preprint arXiv: 2104.08821, 2021.

[35] 
K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2020, pp. 9729–9738.

[36] 
H. Hu, H. Wang, Z. Liu, and W. Chen, “Domaininvariant similarity activation map contrastive learning for retrievalbased longterm visual localization,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 2, pp. 313–328, 2021.

[37] 
X. Xie, F. Sun, Z. Liu, S. Wu, J. Gao, B. Ding, and B. Cui, “Contrastive learning for sequential recommendation,” arXiv preprint arXiv: 2010.14395, 2020.

[38] 
T. Yao, X. Yi, D. Z. Cheng, et al., “Selfsupervised learning for largescale item recommendations,” in Proc. 30th ACM Int. Conf. Information & Knowledge Management, 2021, pp. 4321–4330.

[39] 
K. Hassani and A. H. Khasahmadi, “Contrastive multiview representation learning on graphs,” in Proc. Int. Conf. Machine Learning, 2020, pp. 4116–4126.

[40] 
H. Hafidi, M. Ghogho, P. Ciblat, and A. Swami, “GraphCL: Contrastive selfsupervised learning of graph representations,” arXiv preprint arXiv: 2007.08025, 2020.

[41] 
X. Song, J. Li, Q. Lei, W. Zhao, Y. Chen, and A. Mian, “BiCLKT: Bigraph contrastive learning based knowledge tracing,” arXiv preprint arXiv: 2201.09020, 2022.

[42] 
W. Lee, J. Chun, Y. Lee, K. Park, and S. Park, “Contrastive learning for knowledge tracing,” in Proc. ACM Web Conf., 2022, pp. 2330–2338.

[43] 
C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla, “Heterogeneous graph neural network,” in Proc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, 2019, pp. 793–803.

[44] 
X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu, “Heterogeneous graph attention network,” in Proc. World Wide Web Conf., 2019, pp. 2022–2032.

[45] 
S. Xu, C. Yang, C. Shi, Y. Fang, Y. Guo, T. Yang, L. Zhang, and M. Hu, “Topicaware heterogeneous graph neural network for link prediction,” in Proc. 30th ACM Int. Conf. Information & Knowledge Management, 2021, pp. 2261–2270.

[46] 
Y. Sun, J. Han, C. C. Aggarwal, and N. V. Chawla, “When will it happen? Relationship prediction in heterogeneous information networks,” in Proc. 5th Conf. Web Search and Data Mining, 2012, pp. 663–672.

[47] 
J. Shi, H. Ji, C. Shi, X. Wang, Z. Zhang, and J. Zhou, “Heterogeneous graph neural network for recommendation,” arXiv preprint arXiv: 2009.00799, 2020.

[48] 
S. Liu, I. Ounis, C. Macdonald, and Z. Meng, “A heterogeneous graph neural model for coldstart recommendation,” in Proc. 43rd Int. ACM SIGIR Conf. Research and Development in Infor. Retrieval, 2020, pp. 2029–2032.

[49] 
J. Zhao, X. Wang, C. Shi, B. Hu, G. Song, and Y. Ye, “Heterogeneous graph structure learning for graph neural networks,” in Proc. AAAI Conf. Artificial Intelligence, 2021, vol. 35, no. 5, pp. 4697–4705.

[50] 
S. Garg, V. Perot, N. Limtiaco, A. Taly, E. H. Chi, and A. Beutel, “Counterfactual fairness in text classification through robustness,” in Proc. AAAI/ACM Conf. AI, Ethics, and Society, 2019, pp. 219–226.

[51] 
M. J. Kusner, J. Loftus, C. Russell, and R. Silva, “Counterfactual fairness,” Advances in Neural Information Processing Systems, vol. 30, pp. 4069–4079, 2017. doi: 10.5555/3294996.3295162

[52] 
Z. Wang, J. Zhang, H. Xu, X. Chen, Y. Zhang, W. X. Zhao, and J.R. Wen, “Counterfactual dataaugmented sequential recommendation,” in Proc. 44th Int. ACM SIGIR Conf. Research and Development in Information Retrieval, 2021, pp. 347–356.

[53] 
R. Zmigrod, S. J. Mielke, H. Wallach, and R. Cotterell, “Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology,” arXiv preprint arXiv: 1906.04571, 2019.

[54] 
T.J. Fu, X. E. Wang, M. F. Peterson, S. T. Grafton, M. P. Eckstein, and W. Y. Wang, “Counterfactual visionandlanguage navigation via adversarial path sampler,” in Proc. European Conf. Computer Vision, 2020, pp. 71–86.

[55] 
E. Abbasnejad, D. Teney, A. Parvaneh, J. Shi, and A. v. d. Hengel, “Counterfactual vision and language learning,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2020, pp. 10044–10054.

[56] 
L. Chen, X. Yan, J. Xiao, H. Zhang, S. Pu, and Y. Zhuang, “Counterfactual samples synthesizing for robust visual question answering,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2020, pp. 10800–10809.

[57] 
H. Wang, W. Liang, J. Shen, L. Van Gool, and W. Wang, “Counterfactual cycleconsistent learning for instruction following and generation in visionlanguage navigation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2022, pp. 15471–15481.

[58] 
Y. Zhu, Y. Xu, F. Yu, Q. Liu, S. Wu, and L. Wang, “Graph contrastive learning with adaptive augmentation,” in Proc. Web Conf., 2021, pp. 2069–2080.

[59] 
Y. Jiao, Y. Xiong, J. Zhang, Y. Zhang, T. Zhang, and Y. Zhu, “Subgraph contrast for scalable selfsupervised graph representation learning,” in Proc. IEEE Int. Conf. Data Mining, 2020, pp. 222–231.

[60] 
Y. You, T. Chen, Y. Sui, T. Chen, Z. Wang, and Y. Shen, “Graph contrastive learning with augmentations,” Advances in Neural Information Processing Systems, vol. 33, pp. 5812–5823, 2020.

[61] 
T. Zhao, G. Liu, D. Wang, W. Yu, and M. Jiang, “Learning from counterfactual links for link prediction,” in Proc. Int. Conf. Machine Learning, 2022, pp. 26911–26926.

[62] 
G. W. Imbens and D. B. Rubin, Causal Inference in Statistics, Social, and Biomedical Sciences. Cambrideg, UK: Cambridge University Press, 2015.

[63] 
P. C. Austin, “An introduction to propensity score methods for reducing the effects of confounding in observational studies,” Multivariate Behavioral Research, vol. 46, no. 3, pp. 399–424, 2011. doi: 10.1080/00273171.2011.568786

[64] 
A. Van den Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv: 1807.03748, 2018.

[65] 
M. Feng, N. Heffernan, and K. Koedinger, “Addressing the assessment challenge with an online system that tutors as it assesses,” User Modeling and UserAdapted Interaction, vol. 19, no. 3, pp. 243–266, 2009. doi: 10.1007/s1125700990637

[66] 
J. Stamper, A. NiculescuMizil, S. Ritter, G. Gordon, and K. Koedinger, “Algebra I 2005–2006 and bridge to algebra 2006–2007 development data sets from KDD cup 2010 educational data mining challenge,” 2010. [Online], Available: https://pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp.

[67] 
M. Zhang, X. Zhu, C. Zhang, Y. Ji, F. Pan, and C. Yin, “Multifactors aware dualattentional knowledge tracing,” in Proc. 30th ACM Int. Conf. Information & Knowledge Management, 2021, pp. 2588–2597.

[68] 
A. Ghosh, N. Heffernan, and A. S. Lan, “Contextaware attentive knowledge tracing,” in Proc. 26th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, 2020, pp. 2330–2339.

[69] 
D. P. Kingma and J. Ba, “ADAM: A method for stochastic optimization,” arXiv preprint arXiv: 1412.6980, 2014.

[70] 
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” The J. Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
