IEEE/CAA Journal of Automatica Sinica
Citation: | H. R. Wu, X. Y. Chen, Z. F. Hu, J. Shi, S. Xu, and B. Xu, “Local-to-global causal reasoning for cross-document relation extraction,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 7, pp. 1608–1621, Jul. 2023. doi: 10.1109/JAS.2023.123540 |
Cross-document relation extraction (RE), as an extension of information extraction, requires integrating information from multiple documents retrieved from open domains with a large number of irrelevant or confusing noisy texts. Previous studies focus on the attention mechanism to construct the connection between different text features through semantic similarity. However, similarity-based methods cannot distinguish valid information from highly similar retrieved documents well. How to design an effective algorithm to implement aggregated reasoning in confusing information with similar features still remains an open issue. To address this problem, we design a novel local-to-global causal reasoning (LGCR) network for cross-document RE, which enables efficient distinguishing, filtering and global reasoning on complex information from a causal perspective. Specifically, we propose a local causal estimation algorithm to estimate the causal effect, which is the first trial to use the causal reasoning independent of feature similarity to distinguish between confusing and valid information in cross-document RE. Furthermore, based on the causal effect, we propose a causality guided global reasoning algorithm to filter the confusing information and achieve global reasoning. Experimental results under the closed and the open settings of the large-scale dataset CodRED demonstrate our LGCR network significantly outperforms the state-of-the-art methods and validate the effectiveness of causal reasoning in confusing information processing.
[1] |
B. Distiawan Trisedya, G. Weikum, J. Z. Qi, and R. Zhang, “Neural relation extraction for knowledge base enrichment,” in Proc. 57th Annu. Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 229–240.
|
[2] |
M. Yu, W. P. Yin, K. S. Hasan, C. dos Santos, B. Xiang, and B. W. Zhou, “Improved neural relation detection for knowledge base question answering,” in Proc. 55th Annu. Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 571–581.
|
[3] |
H. Z. Yu, H. S. Li, D. H. Mao, and Q. Cai, “A relationship extraction method for domain knowledge graph construction,” World Wide Web, vol. 23, no. 2, pp. 735–753, Mar. 2020. doi: 10.1007/s11280-019-00765-y
|
[4] |
R. Socher, B. Huval, C. D. Manning, and A. Y. Ng, “Semantic compositionality through recursive matrix-vector spaces,” in Proc. Joint Conf. Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, 2012, pp. 1201–1211.
|
[5] |
Y. K. Lin, S. Q. Shen, Z. Y. Liu, H. B. Luan, and M. S. Sun, “Neural relation extraction with selective attention over instances,” in Proc. 54th Annu. Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016, pp. 2124–2133.
|
[6] |
P. D. Qin, W. R. Xu, and W. Y. Wang, “Robust distant supervision relation extraction via deep reinforcement learning,” in Proc. 56th Annu. Meeting of the Association for Computational Linguistics, Melbourne, Australia, 2018, pp. 2137–2147.
|
[7] |
J. Li, Y. P. Sun, R. J. Johnson, D. Sciaky, C.-H. Wei, R. Leaman, A. P. Davis, C. J. Mattingly, T. C. Wiegers, and Z. Y. Lu, “BioCreative V CDR task corpus: A resource for chemical disease relation extraction,” Database, vol. 2016, pp. baw068, May 2016.
|
[8] |
Y. Yao, D. M. Ye, P. Li, X. Han, Y. K. Lin, Z. H. Liu, Z. Y. Liu, L. X. Huang, J. Zhou, and M. S. Sun, “DocRED: A large-scale document-level relation extraction dataset,” in Proc. 57th Annu. Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 764–777.
|
[9] |
W. X. Zhou, K. Huang, T. Y. Ma, and J. Huang, “Document-level relation extraction with adaptive thresholding and localized context pooling,” in Proc. 35th AAAI Conf. Artificial Intelligence, 2021, pp. 14612–14620.
|
[10] |
D. Vrandečić and M. Krötzsch, “Wikidata: A free collaborative knowledgebase,” Commun. ACM, vol. 57, no. 10, pp. 78–85, Oct. 2014. doi: 10.1145/2629489
|
[11] |
Y. Yao, J. J. Du, Y. K. Lin, P. Li, Z. Y. Liu, J. Zhou, and M. S. Sun, “CodRED: A cross-document relation extraction dataset for acquiring knowledge in the wild,” in Proc. Conf. Empirical Methods in Natural Language Processing, 2021, pp. 4452–4472.
|
[12] |
G. S. Nan, Z. J. Guo, I. Sekulic, and W. Lu, “Reasoning with latent structure refinement for document-level relation extraction,” in Proc. 58th Annu. Meeting of the Association for Computational Linguistics, 2020, pp. 1546–1557.
|
[13] |
B. Li, W. Ye, Z. H. Sheng, R. Xie, X. Y. Xi, and S. K. Zhang, “Graph enhanced dual attention network for document-level relation extraction,” in Proc. 28th Int. Conf. Computational Linguistics, 2020, pp. 1551–1560.
|
[14] |
S. Zeng, R. X. Xu, B. B. Chang, and L. Li, “Double graph based reasoning for document-level relation extraction,” in Proc. Conf. Empirical Methods in Natural Language Processing, 2020, pp. 1630–1640.
|
[15] |
W. Xu, K. H. Chen, and T. J. Zhao, “Document-level relation extraction with reconstruction,” in Proc. 35th AAAI Conf. Artificial Intelligence, 2021, pp. 14167–14175.
|
[16] |
H. R. Wu, W. Chen, S. Xu, and B. Xu, “Counterfactual supporting facts extraction for explainable medical record based diagnosis with graph network,” in Proc. Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 1942–1955.
|
[17] |
Q. T. Wu, H. R. Zhang, J. C. Yan, and D. Wipf, “Handling distribution shifts on graphs: An invariance perspective,” in Proc. 10th Int. Conf. Learning Representations, 2021.
|
[18] |
C. Peng and S. Athey, “Stable learning establishes some common ground between causal inference and machine learning,” Nat. Mach. Intell., vol. 4, no. 2, pp. 110–115, Feb. 2022. doi: 10.1038/s42256-022-00445-z
|
[19] |
J. Pearl, Causality. 2nd ed. Cambridge, UK: Cambridge University Press, 2009.
|
[20] |
D. J. Zeng, K. Liu, S. W. Lai, G. Y. Zhou, and J. Zhao, “Relation classification via convolutional deep neural network,” in Proc. 25th Int. Conf. Computational Linguistics: Tech. Paper, Dublin, Ireland, 2014, pp. 2335–2344.
|
[21] |
L. L. Wang, Z. Cao, G. de Melo, and Z. Y. Liu, “Relation classification via multi-level attention CNNs,” in Proc. 54th Annu. Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016, pp. 1298–1307.
|
[22] |
J. Feng, M. l. Huang, L. Zhao, Y. Yang, and X. Y. Zhu, “Reinforcement learning for relation classification from noisy data,” in Proc. 32nd AAAI Conf. Artificial Intelligence, New Orleans, USA, 2018, pp. 5779–5786.
|
[23] |
N. Y. Zhang, S. M. Deng, Z. L. Sun, J. Y. Chen, W. Zhang, and H. J. Chen, “Relation adversarial network for low resource knowledge graph completion,” in Proc. Web Conf., Taipei, China, 2020, pp. 1–12.
|
[24] |
H. Wang, C. Focke, R. Sylvester, N. Mishra, and W. Wang, “Fine-tune Bert for docRED with two-step process,” arXiv preprint arXiv: 1909.11898, 2019.
|
[25] |
H. Z. Tang, Y. N. Cao, Z. Y. Zhang, J. X. Cao, F. Fang, S. Wang, and P. F. Yin, “HIN: Hierarchical inference network for document-level relation extraction,” in Proc. 24th Pacific-Asia Conf. Knowledge Discovery and Data Mining, Singapore, 2020, pp. 197–209.
|
[26] |
S. Zeng, Y. T. Wu, and B. B. Chang, “SIRE: Separate intra- and inter-sentential reasoning for document-level relation extraction,” in Proc. Findings of the Association for Computational Linguistics, 2021, pp. 524–534.
|
[27] |
Q. Z. Huang, S. Q. Zhu, Y. S. Feng, Y. Ye, Y. X. Lai, and D. Y. Zhao, “Three sentences are all you need: Local path enhanced document relation extraction,” in Proc. 59th Annu. Meeting of the Association for Computational Linguistics and the 11th Int. Joint Conf. Natural Language Processing, 2021, pp. 998–1004.
|
[28] |
Z. Q. Wang, S. C. Gao, M. C. Zhou, S. Sato, J. J. Cheng, and J. H. Wang, “Information-theory-based nondominated sorting ant colony optimization for multiobjective feature selection in classification,” IEEE Trans. Cybern., 2022, doi: 10.1109/TCYB.2022.3185554.
|
[29] |
S. Ramírez-Gallego, H. Mouriño-Talín, D. Martínez-Rego, V. Bolón-Canedo, J. M. Benítez, A. Alonso-Betanzos, and F. Herrera, “An information theory-based feature selection framework for big data under apache spark,” IEEE Trans. Syst. Man Cybern. Syst., vol. 48, no. 9, pp. 1441–1453, Sept. 2018. doi: 10.1109/TSMC.2017.2670926
|
[30] |
R. Q. Lu, X. L. Jin, S. M. Zhang, M. K. Qiu, and X. D. Wu, “A study on big knowledge and its engineering issues,” IEEE Trans. Knowl. Data Eng., vol. 31, no. 9, pp. 1630–1644, Sept. 2019. doi: 10.1109/TKDE.2018.2866863
|
[31] |
S. Athey, “Beyond prediction: Using big data for policy problems,” Science, vol. 355, no. 6324, pp. 483–485, Feb. 2017. doi: 10.1126/science.aal4321
|
[32] |
D. Wu, Y. He, X. Luo, and M. C. Zhou, “A latent factor analysis-based approach to online sparse streaming feature selection,” IEEE Trans. Syst. Man Cybern. Syst., vol. 52, no. 11, pp. 6744–6758, Nov. 2022. doi: 10.1109/TSMC.2021.3096065
|
[33] |
H. Wu, X. Luo, and M. C. Zhou, “Advancing non-negative latent factorization of tensors with diversified regularization schemes,” IEEE Trans. Serv. Comput., vol. 15, no. 3, pp. 1334–1344, Jun. 2022. doi: 10.1109/TSC.2020.2988760
|
[34] |
D. Wu, M. S. Shang, X. Luo, and Z. D. Wang, “An L1-and-L2-norm-oriented latent factor model for recommender systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 10, pp. 5775–5788, Oct. 2022. doi: 10.1109/TNNLS.2021.3071392
|
[35] |
L. Hu, S. C. Yang, X. Luo, and M. C. Zhou, “An algorithm of inductively identifying clusters from attributed graphs,” IEEE Trans. Big Data, vol. 8, no. 2, pp. 523–534, Apr. 2022.
|
[36] |
A. Shrikumar, P. Greenside, and A. Kundaje, “Learning important features through propagating activation differences,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 3145–3153.
|
[37] |
R. Rosenbaum and D. B. Rubin, “The central role of the propensity score in observational studies for causal effects,” Biometrika, vol. 70, no. 1, pp. 41–55, Apr. 1983. doi: 10.1093/biomet/70.1.41
|
[38] |
Z. Y. Shen, P. Cui, K. Kuang, B. Li, and P. X. Chen, “Causally regularized learning with agnostic data selection bias,” in Proc. 26th ACM Int. Conf. Multimedia, Seoul, Republic of Korea, 2018, pp. 411–419.
|
[39] |
K. Kuang, P. Cui, S. Athey, R. X. Xiong, and B. Li, “Stable prediction across unknown environments,” in Proc. 24th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, London, UK, 2018, pp. 1617–1626.
|
[40] |
D. Xu, C. W. Ruan, E. Korpeoglu, S. Kumar, and K. Achan, “Adversarial counterfactual learning and evaluation for recommender system,” in Proc. 34th Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 13515–13526.
|
[41] |
M. Oberst and D. Sontag, “Counterfactual off-policy evaluation with gumbel-max structural causal models,” in Proc. 36th Int. Conf. Machine Learning, Long Beach, USA, 2019, pp. 4881–4890.
|
[42] |
D. Kaushik, E. H. Hovy, and Z. C. Lipton, “Learning the difference that makes a difference with counterfactually-augmented data,” in Proc. 8th Int. Conf. Learning Representations, Addis Ababa, Ethiopia, 2019.
|
[43] |
T.-J. Fu, X. E. Wang, M. F. Peterson, S. T. Grafton, M. P. Eckstein, and W. Y. Wang, “Counterfactual vision-and-language navigation via adversarial path sampler,” in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 71–86.
|
[44] |
Q. F. Zhu, W.-N. Zhang, T. Liu, and W. Y. Wang, “Counterfactual off-policy training for neural dialogue generation,” in Proc. Conf. Empirical Methods in Natural Language Processing, 2020, pp. 3438–3448.
|
[45] |
J. C. Weiss, F. Kuusisto, K. Boyd, J. Liu, and D. Page, “Machine learning for treatment assignment: Improving individualized risk attribution,” in Proc. AMIA, San Francisco, USA, 2015, pp. 1306.
|
[46] |
T. Zhao, G. Liu, D. H. Wang, W. H. Yu, and M. Jiang, “Learning from counterfactual links for link prediction,” in Proc. 39th Int. Conf. Machine Learning, Baltimore, USA, 2022, pp. 26911–26926.
|
[47] |
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, 2019, pp. 4171–4186.
|
[48] |
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, USA, 2017, pp. 6000–6010.
|
[49] |
C. J. Maddison, A. Mnih, and Y. W. Teh, “The concrete distribution: A continuous relaxation of discrete random variables,” in Proc. 5th Int. Conf. Learning Representations, Toulon, France, 2017.
|
[50] |
E. Jang, S. X. Gu, and B. Poole, “Categorical reparameterization with gumbel-softmax,” in Proc. 5th Int. Conf. Learning Representations, Toulon, France, 2017.
|
[51] |
E. J. Gumbel, Statistical Theory of Extreme Values and Some Practical Applications: A Series of Lectures. Washington, USA: US Government Printing Office, 1954.
|
[52] |
D. Alvarez-Melis and T. S. Jaakkola, “Towards robust interpretability with self-explaining neural networks,” in Proc. 32nd Int. Conf. Neural Information Processing Systems, Red Hook, USA, 2018. pp. 7786–7795.
|
[53] |
Y. F. Sun, C. M. Cheng, Y. H. Zhang, C. Zhang, L. Zheng, Z. D. Wang, and Y. C. Wei, “Circle loss: A unified perspective of pair similarity optimization,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 6397–6406.
|
[54] |
D. M. Ye, Y. K. Lin, J. J. Du, Z. H. Liu, P. Li, M. S. Sun, and Z. Y. Liu, “Coreferential reasoning learning for language representation,” in Proc. Conf. Empirical Methods in Natural Language Processing, 2020, pp. 7170–7186.
|
[55] |
G. D. Duan, Y. R. Dong, J. Y. Miao, and T. X. Huang, “Position-aware attention mechanism-based bi-graph for dialogue relation extraction,” Cognit. Comput., vol. 15, no. 1, pp. 359–372, Jan. 2023. doi: 10.1007/s12559-022-10105-4
|
[56] |
F. Q. Wang, F. Li, H. Fei, J. Y. Li, S. Q. Wu, F. F. Su, W. X. Shi, D. H. Ji, and B. Cai, “Entity-centered cross-document relation extraction,” in Proc. Conf. Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 2022, pp. 9871–9881.
|
[57] |
T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in Proc. 5th Int. Conf. Learning Representations, Toulon, France, 2017.
|
[58] |
P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, “Graph attention networks,” in Proc. 6th Int. Conf. Learning Representations, Vancouver, Canada, 2018.
|
[59] |
W. L. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, USA, 2017, pp. 1025–1035.
|
[60] |
J. B. Chen, L. Song, M. Wainwright, and M. Jordan, “Learning to explain: An information-theoretic perspective on model interpretation,” in Proc. 35th Int. Conf. Machine Learning. Stockholm, Sweden, 2018, pp. 883–892.
|
[61] |
J. Liang, B. Bai, Y. R. Cao, K. Bai, and F. Wang, “Adversarial infidelity learning for model interpretation,” in Proc. 26th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, 2020, pp. 286–296.
|
[62] |
X. Wang, Y.-X. Wu, A. Zhang, X. N. He, and T.-S. Chua, “Towards multi-grained explainability for graph neural networks,” in Proc. 35th Conf. Neural Information Processing Systems, 2021, pp. 18446–18458.
|
[63] |
S. Jain, S. Wiegreffe, Y. Pinter, and B. C. Wallace, “Learning to faithfully rationalize by construction,” in Proc. 58th Annu. Meeting of the Association for Computational Linguistics, 2020, pp. 4459–4473.
|