收录:
摘要:
Many clustering algorithms work well on small data sets of less than 200 data objects. However, a large database may contain millions of objects, and clustering on such a large data set may lead to biased results. As data volumes and availability continue to grow, so does the need for large dataset analytics. Among the most commonly used clustering algorithms, K-means proved to be one of the most popular choices to provide acceptable results in a reasonable amount of time. In this paper, we present an improved k-means algorithm with better initial centroids. Also, we implement this modified algorithm on Hadoop platform. Experiments show that the improved k-means algorithm converges faster than the classic k-means and the average execution time is reduced compared to the traditional k-means.
关键词:
通讯作者信息:
电子邮件地址: