One Self-Adaptive Memory Scheduling Algorithm for the Shuffle Process in Spark Platform - Details

Author：

Xu, Jungang (Xu, Jungang.) | Huang, Shanshan (Huang, Shanshan.) | Liu, Renfeng (Liu, Renfeng.) | Li, Pengfei (Li, Pengfei.) (Scholars：李鹏飞)

Indexed by：

EI Scopus

Abstract：

The　Shuffle　module　is　one　of　the　core　modules　in　Spark　platform,　its　performance　directly　influences　the　performance　and　throughput　of　the　whole　Spark　platform.　The　existing　memory　scheduling　algorithm　for　the　Shuffle　process　only　equitably　allocates　tasks　according　to　the　number　of　tasks　without　considering　the　different　memory　requirements　of　different　tasks,　which　causes　memory　utilization　to　drop　and　low　running　efficiency　when　data　is　skewed.　To　solve　this　problem,　one　self-adaptive　memory　scheduling　algorithm　for　the　Shuffle　process　(SAMSAS)　is　proposed　in　this　paper,　which　does　not　need　to　set　the　priority　of　task　processing　in　advance.　Instead,　it　can　adjust　memory　allocation　self-adaptively　through　constantly　monitoring　and　learning　the　actual　memory　requirements　of　task　execution.　The　experimental　results　show　that　SAMSAS　algorithm　can　improve　the　utilization　rate　of　the　entire　memory　pool　and　the　running　efficiency　of　each　Task,　and　specially　it　can　effectively　improve　the　running　efficiency　of　Spark　platform　when　processing　skew　data.　©　2018　IEEE.

Keyword：

Data handling Efficiency Memory architecture Big data

Author Community：

[ 1 ] [Xu, Jungang]University of Chinese Academy of Sciences, Beijing, China
[ 2 ] [Huang, Shanshan]Beijing University of Technology, Beijing, China
[ 3 ] [Liu, Renfeng]University of Chinese Academy of Sciences, Beijing, China
[ 4 ] [Li, Pengfei]University of Chinese Academy of Sciences, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Research on key technologies of urban road safety under the condition of big data
2016，International Journal of Simulation: Systems, Science and Technology
Design and implementation of air quality data processing system based on big data technology
2018，4th IEEE International Conference on Computer and Communications, ICCC 2018
Understanding data partition for applications on CPU-GPU integrated processors
2018，13th International Conference on Mobile Ad-hoc and Sensor Networks, MSN 2017
Accelerating parallel als for collaborative filtering on hadoop
2020，2nd International Symposium on Benchmarking, Measuring, and Optimization, Bench 2019

Source ：

Year： 2018

Page： 3938-3946

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 1

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 2

Affiliated Colleges：

城市建设学部

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to