Research on Hadoop-based Massive short text clustering algorithm - Details

Author：

Zhao, Qiang (Zhao, Qiang.) | Shi, Yuliang (Shi, Yuliang.) | Qing, Zepeng (Qing, Zepeng.)

Indexed by：

CPCI-S EI Scopus

Abstract：

Many　clustering　algorithms　work　well　on　small　data　sets　of　less　than　200　data　objects.　However,　a　large　database　may　contain　millions　of　objects,　and　clustering　on　such　a　large　data　set　may　lead　to　biased　results.　As　data　volumes　and　availability　continue　to　grow,　so　does　the　need　for　large　dataset　analytics.　Among　the　most　commonly　used　clustering　algorithms,　K-means　proved　to　be　one　of　the　most　popular　choices　to　provide　acceptable　results　in　a　reasonable　amount　of　time.　In　this　paper,　we　present　an　improved　k-means　algorithm　with　better　initial　centroids.　Also,　we　implement　this　modified　algorithm　on　Hadoop　platform.　Experiments　show　that　the　improved　k-means　algorithm　converges　faster　than　the　classic　k-means　and　the　average　execution　time　is　reduced　compared　to　the　traditional　k-means.

Keyword：

MapReduce clustering Hadoop k-means

Author Community：

[ 1 ] [Zhao, Qiang]Beijing Univ Technol, Sch Software Engn, 34 100 Pingyuan, Beijing, Peoples R China
[ 2 ] [Shi, Yuliang]Beijing Univ Technol, Sch Software Engn, 34 100 Pingyuan, Beijing, Peoples R China
[ 3 ] [Qing, Zepeng]Beijing Univ Technol, Sch Software Engn, 34 100 Pingyuan, Beijing, Peoples R China

Reprint Author's Address：

[Shi, Yuliang]Beijing Univ Technol, Sch Software Engn, 34 100 Pingyuan, Beijing, Peoples R China

Email：

qiangzhaoo@163.con |
shiyl@bjut.edu.cn |
qzp_bjut@163.com

Show more details

Related Keywords：

Improved Iteration FCM Algorithm for MapReduce Research
2015，2nd International Conference on Telecommunications and Communication Engineering (ICTCE)
MapReduce FCM clustering set algorithm
2020，CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS
基于Hadoop平台的LDA算法的并行化实现
2016，计算机工程与科学
面向农业科学数据的分布式存储方法研究
2016，计算机工程与应用

Source ：

FOURTH INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION

ISSN： 0277-786X

Year： 2019

Volume： 11198

Language： English

Cited Count：

WoS CC Cited Count： 7

SCOPUS Cited Count： 7

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

信息学部软件学院

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to