SMConf: One-Size-Fit-Bunch, Automated Memory Capacity Configuration for In-Memory Data Analytic Platform - Details

Author：

Liang, Yi (Liang, Yi.) | Zeng, Shaokang (Zeng, Shaokang.) | Xu, Xiaoxian (Xu, Xiaoxian.) | Chang, Shilu (Chang, Shilu.) | Su, Xing (Su, Xing.)

Indexed by：

EI Scopus SCIE

Abstract：

Spark　is　the　most　popular　in-memory　processing　framework　for　big　data　analytics.　Memory　is　the　crucial　resource　for　workloads　to　achieve　performance　acceleration　on　Spark.　The　extant　memory　capacity　configuration　approach　in　Spark　is　to　statically　configure　the　memory　capacity　for　workloads　based　on　user＇s　specifications.　However,　without　the　deep　knowledge　of　the　workload＇s　system-level　characteristics,　users　in　practice　often　conservatively　overestimate　the　memory　utilizations　of　their　workloads　and　require　resource　manager　to　grant　more　memory　share　than　that　they　actually　need,　which　leads　to　the　severe　waste　of　memory　resources.　To　address　the　above　issue,　SMConf,　an　automated　memory　capacity　configuration　solution　for　in-memory　computing　workloads　in　Spark　is　proposed.　SMConf　is　designed　based　on　the　observation　that,　though　there　is　not　one-size　-fit-all　proper　configuration,　the　one-size-fit-bunch　configuration　can　be　found　for　in-memory　computing　workloads.　SMConf　classifies　typical　Spark　workloads　into　categories　based　on　metrics　across　layers　of　Spark　system　stack.　For　each　workload　category,　an　individual　memory　requirement　model　is　learned　from　the　workload＇s　input　data　size　and　the　strong-correlated　configuration　parameters.　For　an　ad-hoc　workload,　SMConf　matches　its　memory　requirement　signature　to　one　of　the　workload　categories　with　small-sized　input　data　and　determines　its　proper　memory　capacity　configuration　with　the　corresponding　memory　requirement　model.　Experimental　results　demonstrate　that,　compared　to　the　conservative　default　configuration,　SMConf　can　reduce　the　memory　resource　provision　to　Spark　workloads　by　up　to　69%　with　the　slight　performance　degradation,　and　reduce　the　average　turnaround　time　of　Spark　workloads　by　up　to　55%　in　the　multi-tenant　environments.

Keyword：

automated configuration memory capacity Spark

Author Community：

[ 1 ] [Liang, Yi]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 2 ] [Zeng, Shaokang]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 3 ] [Chang, Shilu]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 4 ] [Su, Xing]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 5 ] [Xu, Xiaoxian]Univ Zurich, Dept Informat, CH-8050 Zurich, Switzerland

Reprint Author's Address：

[Liang, Yi]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China

Email：

yliang@bjut.edu.cn

Show more details

Related Keywords：

A Fine-Grained Task Monitoring Mechanism in Spark Platform
2017，International Conference on Advances in Materials, Machinery, Electrical Engineering (AMMEE)
面向Spark的批处理应用执行时间预测模型
2021，李硕
Comparatively investigating the leading and trailing spark plug on the hydrogen rotary engine
2022，FUEL
Numerical study on ignition amelioration of a hydrogen-enriched Wankel engine under lean-burn condition
2019，APPLIED ENERGY

Source ：

CMC-COMPUTERS MATERIALS & CONTINUA

ISSN： 1546-2218

Year： 2021

Issue： 2

Volume： 66

Page： 1697-1717

3 . 1 0 0

JCR@2022

ESI HC Threshold：87

JCR Journal Grade：2

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 2

Affiliated Colleges：

信息学部

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to