• 综合
  • 标题
  • 关键词
  • 摘要
  • 学者
  • 期刊-刊名
  • 期刊-ISSN
  • 会议名称
搜索

作者:

Huang, Shanshan (Huang, Shanshan.) | Xu, Jungang (Xu, Jungang.) | Liu, Renfeng (Liu, Renfeng.) | Liao, Husheng (Liao, Husheng.) (学者:廖湖声)

收录:

CPCI-S

摘要:

With the wide application of Spark big data platform, some problems in practical application are exposed, and one of the main problems is performance optimization. The Shuffle module of Spark is one of the core modules of Spark, and it is also an important module of some other distributed big data computing frameworks. The design of Shuffle module is the key factor that directly determines the performance of big data computing framework. The main optimization parameters of Shuffle process involve the CPU utilization, I/O literacy rate, network transmission rate, and one of these factors is likely to be the bottleneck during the execution of application. The network data transmission time consumption, I/O read and write time, and the CPU utilization are closely related with the size of the data processing. As a result, Spark provides compression configuration options and different compression algorithms for users to select. Different compression algorithms have different effects in compression rate and compression ratio, but the default configuration is usually selected by all users even though they run different applications, so the optimal configuration cannot be achieved. In order to achieve the optimal configuration of compression algorithm for the Shuffle process, one cost optimization model for Spark Shuffle process is proposed in this paper, which enables users to get the best compression configuration before application execution. The experimental results show that the prediction model for compression configuration has an accuracy of 58.3%, and the proposed cost optimization model can improve the performance by 48.9%.

关键词:

compression configuration cost model Shuffle process Spark

作者机构:

  • [ 1 ] [Huang, Shanshan]Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing, Peoples R China
  • [ 2 ] [Xu, Jungang]Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing, Peoples R China
  • [ 3 ] [Liu, Renfeng]Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing, Peoples R China
  • [ 4 ] [Huang, Shanshan]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
  • [ 5 ] [Liao, Husheng]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China

通讯作者信息:

  • [Huang, Shanshan]Univ Chinese Acad Sci, Sch Comp & Control Engn, Beijing, Peoples R China;;[Huang, Shanshan]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China

查看成果更多字段

相关关键词:

相关文章:

来源 :

2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)

ISSN: 2639-1589

年份: 2017

页码: 2931-2940

语种: 英文

被引次数:

WoS核心集被引频次: 4

SCOPUS被引频次:

ESI高被引论文在榜: 0 展开所有

万方被引频次:

中文被引频次:

近30日浏览量: 2

归属院系:

在线人数/总访问数:191/3605446
地址:北京工业大学图书馆(北京市朝阳区平乐园100号 邮编:100124) 联系我们:010-67392185
版权所有:北京工业大学图书馆 站点建设与维护:北京爱琴海乐之技术有限公司