• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Huang, Shanshan (Huang, Shanshan.) | Xu, Jungang (Xu, Jungang.) | Liu, Renfeng (Liu, Renfeng.) | Liao, Husheng (Liao, Husheng.) (Scholars:廖湖声)

Indexed by:

EI Scopus

Abstract:

With the wide application of Spark big data platform, some problems in practical application are exposed, and one of the main problems is performance optimization. The Shuffle module of Spark is one of the core modules of Spark, and it is also an important module of some other distributed big data computing frameworks. The design of Shuffle module is the key factor that directly determines the performance of big data computing framework. The main optimization parameters of Shuffle process involve the CPU utilization, I/O literacy rate, network transmission rate, and one of these factors is likely to be the bottleneck during the execution of application. The network data transmission time consumption, I/O read and write time, and the CPU utilization are closely related with the size of the data processing. As a result, Spark provides compression configuration options and different compression algorithms for users to select. Different compression algorithms have different effects in compression rate and compression ratio, but the default configuration is usually selected by all users even though they run different applications, so the optimal configuration cannot be achieved. In order to achieve the optimal configuration of compression algorithm for the Shuffle process, one cost optimization model for Spark Shuffle process is proposed in this paper, which enables users to get the best compression configuration before application execution. The experimental results show that the prediction model for compression configuration has an accuracy of 58.3%, and the proposed cost optimization model can improve the performance by 48.9%. © 2017 IEEE.

Keyword:

Big data Optimization Electric sparks Predictive analytics

Author Community:

  • [ 1 ] [Huang, Shanshan]School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, China
  • [ 2 ] [Huang, Shanshan]Faculty of Information Technology, Beijing University of Technology, Beijing, China
  • [ 3 ] [Xu, Jungang]School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, China
  • [ 4 ] [Liu, Renfeng]School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, China
  • [ 5 ] [Liao, Husheng]Faculty of Information Technology, Beijing University of Technology, Beijing, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Year: 2017

Volume: 2018-January

Page: 2931-2940

Language: English

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count: 4

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 2

Affiliated Colleges:

Online/Total:575/5282110
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.