• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Luo, Zhiyong (Luo, Zhiyong.) | Song, Rou (Song, Rou.)

Indexed by:

EI Scopus PKU CSCD

Abstract:

Disambiguation is one of the most important parts of segment systems in Chinese. A Chinese general-purpose word segmentation (GPWS) system demands higher capacity of disambiguation techniques particularly, because it has functions such as allowing users to create their own dictionaries dynamically and employing multiple user's dictionaries to word segmentation. Based on inspection of the distributions and characteristics of ambiguity fragments (especially overlapping ambiguity fragments) in large-scale real corpus, an improved forward maximum match algorithm for ambiguity fragment detection, as well as a practical 'rules + exceptions' disambiguation strategy, are proposed in this paper. An exhaustive extraction has been made of the overlapping ambiguity sections (about 2.4 million occurrences) from a People's Daily corpus of 100 million characters (234 MB approximately), and open-ended experiments on the above strategy randomly were carried out, which achieved accuracy average of 99%.

Keyword:

Algorithms Word processing Pattern recognition Pattern matching Statistical methods Natural language processing systems

Author Community:

  • [ 1 ] [Luo, Zhiyong]College of Computer Science, Beijing University of Technology, Beijing 100022, China
  • [ 2 ] [Luo, Zhiyong]College of Information Science, Beijing Language and Culture University, Beijing 100083, China
  • [ 3 ] [Song, Rou]College of Information Science, Beijing Language and Culture University, Beijing 100083, China

Reprint Author's Address:

Show more details

Related Keywords:

Related Article:

Source :

Computer Research and Development

ISSN: 1000-1239

Year: 2006

Issue: 6

Volume: 43

Page: 1122-1128

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count: 7

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 1

Affiliated Colleges:

Online/Total:756/5299881
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.