• 综合
  • 标题
  • 关键词
  • 摘要
  • 学者
  • 期刊-刊名
  • 期刊-ISSN
  • 会议名称
搜索

作者:

Huang, Shanshan (Huang, Shanshan.) | Xu, Jungang (Xu, Jungang.) | Liu, Renfeng (Liu, Renfeng.) | Liao, Husheng (Liao, Husheng.) (学者:廖湖声)

收录:

EI Scopus

摘要:

With the wide application of Spark big data platform, some problems in practical application are exposed, and one of the main problems is performance optimization. The Shuffle module of Spark is one of the core modules of Spark, and it is also an important module of some other distributed big data computing frameworks. The design of Shuffle module is the key factor that directly determines the performance of big data computing framework. The main optimization parameters of Shuffle process involve the CPU utilization, I/O literacy rate, network transmission rate, and one of these factors is likely to be the bottleneck during the execution of application. The network data transmission time consumption, I/O read and write time, and the CPU utilization are closely related with the size of the data processing. As a result, Spark provides compression configuration options and different compression algorithms for users to select. Different compression algorithms have different effects in compression rate and compression ratio, but the default configuration is usually selected by all users even though they run different applications, so the optimal configuration cannot be achieved. In order to achieve the optimal configuration of compression algorithm for the Shuffle process, one cost optimization model for Spark Shuffle process is proposed in this paper, which enables users to get the best compression configuration before application execution. The experimental results show that the prediction model for compression configuration has an accuracy of 58.3%, and the proposed cost optimization model can improve the performance by 48.9%. © 2017 IEEE.

关键词:

Big data Electric sparks Optimization Predictive analytics

作者机构:

  • [ 1 ] [Huang, Shanshan]School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, China
  • [ 2 ] [Huang, Shanshan]Faculty of Information Technology, Beijing University of Technology, Beijing, China
  • [ 3 ] [Xu, Jungang]School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, China
  • [ 4 ] [Liu, Renfeng]School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, China
  • [ 5 ] [Liao, Husheng]Faculty of Information Technology, Beijing University of Technology, Beijing, China

通讯作者信息:

电子邮件地址:

查看成果更多字段

相关关键词:

相关文章:

来源 :

年份: 2017

卷: 2018-January

页码: 2931-2940

语种: 英文

被引次数:

WoS核心集被引频次: 0

SCOPUS被引频次: 4

ESI高被引论文在榜: 0 展开所有

万方被引频次:

中文被引频次:

近30日浏览量: 2

归属院系:

在线人数/总访问数:711/2900146
地址:北京工业大学图书馆(北京市朝阳区平乐园100号 邮编:100124) 联系我们:010-67392185
版权所有:北京工业大学图书馆 站点建设与维护:北京爱琴海乐之技术有限公司