Dacoop: Accelerating data-iterative applications on Map/Reduce cluster - Details

Author：

Liang, Yi (Liang, Yi.) | Li, Guangrui (Li, Guangrui.) | Wang, Lei (Wang, Lei.) | Hu, Yanpeng (Hu, Yanpeng.)

Indexed by：

EI Scopus

Abstract：

Map/reduce　is　a　popular　parallel　processing　framework　for　massive-scale　data-intensive　computing.　The　data-iterative　application　is　composed　of　a　serials　of　map/reduce　jobs　and　need　to　repeatedly　process　some　data　files　among　these　jobs.　The　existing　implementation　of　map/reduce　framework　focus　on　perform　data　processing　in　a　single　pass　with　one　map/reduce　job　and　do　not　directly　support　the　data-iterative　applications,　particularly　in　term　of　the　explicit　specification　of　the　repeatedly　processed　data　among　jobs.　In　this　paper,　we　propose　an　extended　version　of　Hadoop　map/reduce　framework　called　Dacoop.　Dacoop　extends　Map/Reduce　programming　interface　to　specify　the　repeatedly　processed　data,　introduces　the　shared　memorybased　data　cache　mechanism　to　cache　the　data　since　its　first　access,　and　adopts　the　caching-aware　task　scheduling　so　that　the　cached　data　can　be　shared　among　the　map/reduce　jobs　of　data-iterative　applications.　We　evaluate　Dacoop　on　two　typical　data-iterative　applications:　k-means　clustering　and　the　domain　rule　reasoning　in　sementic　web,　with　real　and　synthetic　datasets.　Experimental　results　show　that　the　data-iterative　applications　can　gain　better　performance　on　Dacoop　than　that　on　Hadoop.　The　turnaround　time　of　a　data-iterative　application　can　be　reduced　by　the　maximum　of　15.1%.　©　2011　IEEE.

Keyword：

Multitasking Distributed computer systems Scheduling algorithms K-means clustering Cache memory

Author Community：

[ 1 ] [Liang, Yi]Department of Computer Science, Beijing University of Technology, Beijing, China
[ 2 ] [Li, Guangrui]Department of Computer Science, Beijing University of Technology, Beijing, China
[ 3 ] [Wang, Lei]Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
[ 4 ] [Hu, Yanpeng]Hwellzen Software Center, Shanghai, China

Reprint Author's Address：

Email：

yliang@bjut.edu.cn

Show more details

Related Keywords：

Cooperative game model of task scheduling in grid service composition
2012，Journal of Beijing University of Technology
A task scheduling algorithm for hadoop platform
2013，Journal of Computers (Finland)
Temporal-variation-aware profit-maximized and delay-bounded task scheduling in green data center
2019，12th International Conference on Internet and Distributed Computing Systems, IDCS 2019
Cost-optimized Task Scheduling with Improved Deep Q-Learning in Green Data Centers
2022，2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022

Source ：

Year： 2011

Page： 207-214

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 2

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

信息学部计算机学院

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to