Applied and Computational Engineering

- The Open Access Proceedings Series for Conferences


Proceedings of the 3rd International Conference on Signal Processing and Machine Learning

Series Vol. 4 , 30 May 2023


Open Access | Article

Research on information extraction technology applied for knowledge graphs

Wei Zhou * 1
1 Renmin University of China,No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China

* Author to whom correspondence should be addressed.

Applied and Computational Engineering, Vol. 4, 26-31
Published 30 May 2023. © 2023 The Author(s). Published by EWA Publishing
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Citation Wei Zhou. Research on information extraction technology applied for knowledge graphs. ACE (2023) Vol. 4: 26-31. DOI: 10.54254/2755-2721/4/20230340.

Abstract

Information extraction is an important part of natural language processing and is an important basis for building question and answer systems and knowledge graphs. A growing number of new technologies are being applied to information extraction with the development of deep learning techniques. As a first step, this paper introduces information extraction techniques and their main tasks, then describes the development history of information extraction techniques, and introduces the practice and application of different types of information extraction techniques in knowledge graph construction, including entity-extraction, relationship extraction and attribute extraction. Finally, some problems and research directions faced by information extraction techniques are discussed.

Keywords

Knowledge Graph, Information Extraction, Entity Extraction, Relationship Extraction

References

1. Etzioni, O., Fader, A., Christensen, J., et al. (2011) Open Information Extraction: The Second Generation.

2. Wu, X.D., Wu, J., Fu, X.Y., Li, J.C., Zhou, P. and Jiang, X. (2019) Automatic Knowledge Graph Construction: A Report on the 2019 ICDM/ICBK Contest. 2019 IEEE International Conference on Data Mining (ICDM), Beijing, China, 8-11 November 2019, 1540-1545. https://doi.org/10.1109/ICDM.2019.00204

3. Ralph Grishman, Beth Sundheim: Message Understanding Conference - 6: A Brief History. In: Proceedings of the 16th International Conference on Computational Linguistics (COLING), I, Copenhagen, 1996, 466–471.

4. Zhao, J., Liu, K., Zhou, G.Y., et al. (2011) Open Information Extraction. Journal of Chinese Information Processing, 25, 98-110.

5. Etzioni, O., Cafarella, M., Downey, D., et al. (2005) Unsupervised Named-Entity Extraction from the Web: An Experimental Study. Artificial Intelligence, 165, 91-134. https://doi.org/10.1016/j.artint.2005.03.001

6. Shi, B., Zhang, Z., Sun, L., et al. (2014) A Probabilistic Co-Bootstrapping Method for Entity Set Expansion. Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, august 2014, 2280-2290.

7. Agichtein, E. and Gravano, L. (2000) Snowball: Extracting Relations from Large Plain-Text Collections. Proceedings of the 5th ACM Conference on Digital Libraries, San Antonio, June 2010, 85-94.

8. Fader, A., Soderland, S. and Etzioni, O. (2011) Identifying Relations for Open Information Extraction. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, John McIntyre Conference Centre,Edinburgh, 27-31 July 2011, 1535-1545.

9. RatnaParkhi, A. (1997) A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Institute for Research in Cognitive Science, Technical Reports, University of Pennsylvania, Pennsylvania, 97-108.

10. Rau, L.F. (1991) Extracting Company Names from Text. Proceedings of the 7th IEEE Conference on Artificial Intelligence Applications Piscataway, Miami Beach, 24-28 February 1991, 29-32.

11. Zhu, J., Nei, Z.Q., Liu, X.J., et al. (2009) StatSnowball: A Statistical Approach to Extracting Entity Relationships. Proceedings of the 18th International Conference on World Wide Web, Madrid, 20-24 April 2009, 101-110.

12. Yi, L., Mari, O. and Hannaneh, H. (2017) Scientific Information Extraction with Semi-Supervised Neural Tagging. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, September 2017 2641-2651.

13. Godin, F., Vandersmissen, B., Neve, W.D., et al. (2015) Multimedia Lab @ ACL W-NUT NER Shared Task: NamedEntity Recognition for Twitter Microposts Using Distributed Word Representations. Proceedings of the Workshop on Noisy User-Generated Text, Beijing, July 2015, 146-153.

14. Bollegala, D.T., Matsuo, Y. and Ishizuka, M. (2010) Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web. Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, 26-30 April 2010, 151-160.

15. SODERLANDS. Learning information extraction rules for semi - structured and Free Text[J]. Machine Learning, 1999,34(1-3):233 - 272.

16. ZHOU G D, SU J. Named entity recognition USing an HMM— based chunk tagger[C]//Proceedings of 40th Annual Meeting of the Association for Computatoional Linguistics. Philadelphia, PA, USA, 2002 :473-480.

17. BORTHWICK A. A maximum entropy approach to named entity recog¬nition[D]. New York: New York University, 1999.

18. CRISTANINI N, SHAWE - TAYLOR J. An introduction to support vector machines[M]. Cambridge: Cambridge University Press, 2000.

19. LAFFERTY J, MCALLUM A, PEREIRA F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [C] //Proceedings of the Eighteenth International Conference on Machine Learning, 2001 :282 -289

20. MIKOLOV T, CHEN K, CORRADO G, et al. Efficient Estimation of Word Representations in Vector Space[J]. arXiv preprint arXiv: 1301.3781, 2013

21. LIUC Y, SUNWB, CHAO W H, et al. Convolution Neural Network for Relation Extraction L C]//International Conference on Advanced Data Mining and Applications. Springer, Berlin, Heidelberg, 2013 : 231 - 242.

22. ZENG D, LIU K, LAI S, et al. Relation classification via convolu¬tional deep neural network [C]//Proceedings of the 25th International Conference on Computational Linguistics, 2014:2335 -2344.

23. NGUYEN TH, GRISHMAN R. Combining neural networks and log - linear models to improve relation extraction [ J. arXiv preprint arXiv: 1511.05926,2015.

24. ZH DONGXU, DONG W. Relation Classification via Recurrent Neu¬ral Network [J], arXiv preprintarXiv : 1508.01006,2015 : 121 -128.

25. CAI R, ZHANG X, WANG H. Bidirectional Recurrent Convolutional Neural Network for Relation Classification 1 C J//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany ,2016 :756 -765.

26. MIWA M, BANSAL M. End - to - end relation extraction using LST- Ms on sequences and tree structures [C] //Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Ber¬lin, Germany, 2016 : 1105 -1116.

27. KATIYAR A, CARDIE C. Going out on a limb: Joint Extraction of Entity Mentions and Relations without Dependency Trees [C]//Pro¬ceedings of the 55th Annual Meeting of the Association for Computa¬tional Linguistics. Vancouver, Canada, 2017 :917 - 928.

28. DEVLIN J, CHANG M W, LEK K, et al. BERT: Pre - training of Deep Bidirectional Transformers for Language Understanding] C」// Proceedings of the 2019 Confence of the North American Chapter of the Association for Computational Linguistics: Human Lnguage Tech¬nologies. 2019 : 4171 -4186.

29. Young, T., Hazarika, D., et al. Recent trends in deep learning based natural language processing[J]. IEEE Computational Intelligence Magazine. 2018, 13(3), 55-75.

30. Wang, Zhiguo, et al. Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering[C]// Proceedings of the 2019. Conference on Empirical Methods in Natural Language Processing and the 9th International. Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.

31. Rajpurkar, Pranav, et al. Squad: 100,000+ questions for machine comprehension of text[J]. arXiv preprint arXiv:1606.05250, 2016.

32. Alberti, Chris, Kenton Lee, et al. A bert baseline for the natural questions[J]. arXiv preprintarXiv:1901.08634, 2019.

33. Kwiatkowski, Tom, et al. Natural questions: a benchmark for question answering research[J]. Transactions of the Association for Computational Linguistics 7. 2019: 453-466.

Data Availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Authors who publish this series agree to the following terms:

1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.

2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.

3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open Access Instruction).

Volume Title
Proceedings of the 3rd International Conference on Signal Processing and Machine Learning
ISBN (Print)
978-1-915371-55-3
ISBN (Online)
978-1-915371-56-0
Published Date
30 May 2023
Series
Applied and Computational Engineering
ISSN (Print)
2755-2721
ISSN (Online)
2755-273X
DOI
10.54254/2755-2721/4/20230340
Copyright
© 2023 The Author(s)
Open Access
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

Copyright © 2023 EWA Publishing. Unless Otherwise Stated