A combining approach for Chinese word segmentation - Details

Author：

Wang, Aiqing (Wang, Aiqing.) | Zhang, Sen (Zhang, Sen.)

Indexed by：

EI Scopus

Abstract：

In　Chinese　and　many　other　Asian　languages　which　are　based　on　non-ASCII　alphabet,　words　are　not　delimited　with　whitespace　(space,　tab　etc.),　and　word　boundaries　must　therefore　be　reconstructed.　Further　syntactic　analysis　is　based　on　the　output　of　word　segmentation　result.　Ambiguity　and　unregistered　words　are　the　most　important　problems　in　Chinese　word　segmentation.　In　this　paper　we　analyzed　the　ambiguous　reasons　and　presented　a　one-pass　scan　method　for　the　detection　and　modification　of　ambiguous　cases.　To　deal　with　the　unregistered　words　and　special　words　(such　as　names),　we　proposed　a　combination　method　that　can　recognize　new　words,　hence　the　accuracy　can　be　increased.　In　the　realization,　we　used　the　bisection　search　method　to　look　up　words　in　a　large　dictionary　(more　than　40,000　items),　and　the　average　search　cost　for　a　word　is　less　than　16　operations,　so　the　speed　is　satisfactory　if　the　system　is　embedded　into　Chinese　understanding　systems　or　Chinese　speech　processing　systems.　©　2007　IEEE.

Keyword：

Syntactics Image segmentation Character recognition Natural language processing systems Speech processing

Author Community：

[ 1 ] [Wang, Aiqing]Department of Mathematics, Qingdao Technological University, Qingdao, China
[ 2 ] [Zhang, Sen]Information and Computing Sci. Lab., Beijing University of Technology, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Use of multi-strategy to Textual Entailment recognition
2011，Journal of Computational Information Systems
Structure and performance analysis of open domain QA system
2009，Pattern Recognition and Artificial Intelligence
Extended super function based Chinese Japanese machine translation
2009，2009 International Conference on Natural Language Processing and Knowledge Engineering, NLP-KE 2009
C-LSTM for aspect sentiment classification
2018，IPPTA: Quarterly Journal of Indian Pulp and Paper Technical Association

Source ：

Year： 2007

Volume： 3

Page： 738-743

Language： English

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 4

Affiliated Colleges：

信息学部

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to