Research on Keyword Extraction Algorithm Using PMI and TextRank - Details

Author：

Tao, Yang (Tao, Yang.) | Cui, Zhu (Cui, Zhu.) | Jiazhe, Zhang (Jiazhe, Zhang.)

Indexed by：

EI Scopus

Abstract：

Keyword　extraction　is　a　basic　text　retrieval　technique　in　natural　language　processing,　which　can　highly　summarize　text　content　and　reflect　the　author＇s　writing　purposes.　It　plays　an　important　role　in　document　retrieval,　text　classification　and　data　mining.　In　this　paper,　we　propose　a　TextRank　algorithm　based　on　PMI　(pointwise　mutual　information)　weighting　for　extracting　keywords　from　documents.　The　initial　transition　probability　of　the　candidate　words　is　constructed　by　calculating　the　PMI　between　vocabularies,　which　is　used　for　iterative　calculation　of　the　vocabulary　graph　model　within　TextRank　and　keyword　extraction.　Taking　into　account　the　mutual　information　between　the　vocabulary　in　the　document　set,　the　word　relationship　in　the　single　document　is　corrected,　which　is　helpful　to　improve　the　accuracy　of　document　keyword　extraction.　Experiments　show　that　our　method　achieves　better　performance　in　extracting　keywords　in　large-scale　text　data.　©　2019　IEEE.

Keyword：

Data mining Text mining Extraction Information retrieval systems Information retrieval Iterative methods Natural language processing systems Text processing Classification (of information)

Author Community：

[ 1 ] [Tao, Yang]College of Computer Science and Technology, Beijing University of Technology, Beijing, China
[ 2 ] [Cui, Zhu]College of Computer Science and Technology, Beijing University of Technology, Beijing, China
[ 3 ] [Jiazhe, Zhang]College of Computer Science and Technology, Beijing University of Technology, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

A Novel Method of Chinese Text Content Analysis and Mining based on Statistical Models
2023，2nd International Conference on Statistics, Applied Mathematics, and Computing Science, CSAMCS 2022
Topological Data Analysis of Two Cases: Text Classification and Business Customer Relationship Management
2020，2020 4th International Workshop on Advanced Algorithms and Control Engineering, IWAACE 2020
Research on adoption of gene-disease identification and recognition phenotype process in High-Performance Computing and Applications (HPCA) research
2019，21st IEEE International Conference on High Performance Computing and Communications, 17th IEEE International Conference on Smart City and 5th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2019
An Entity Linking Method Based on Entity Category and Word Embedding
2019，2019 3rd International Conference on Data Mining, Communications and Information Technology, DMCIT 2019

Source ：

Year： 2019

Page： 5-9

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 15

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 3

Affiliated Colleges：

信息学部计算机学院

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to