Accelerating De Novo Assembler WTDBG2 on Commodity Servers - Details

Author：

Indexed by：

Abstract：

De　novo　genome　assembly　reconstructs　the　chromosomes　from　massive　relatively　short　fragmented　reads　and　serves　as　fundamental　for　studying　new　species　where　there　is　no　reference　genome.　Wtdbg2　is　a　de　novo　assembler　for　long　reads　that　is　upto　hundreds　of　kilobases.　It　is　based　on　fuzzy-Bruijn　graph　(FBG)　and　is　ten　times　faster　than　the　cutting-edge　assemblers　such　as　Canu.　However,　the　performance　of　wtdbg2　still　requires　further　improvement:　1)　it　requires　upto　terabytes　of　memory　to　compute　the　assembly,　which　is　infeasible　to　run　on　commodity　server;　2)　it　requires　tens　of　hours　for　assembling　on　large　datasets　such　as　genomes　of　homo　sapiens.　To　address　the　above　drawbacks,　we　propose　several　optimization　techniques　for　accelerating　wtdbg2　on　commodity　server,　including　a　memory　auto-tuning　scheme,　sequence　alignment　optimization　and　intermediate　result　elimination　in　the　output　procedure.　We　compare　the　optimized　wtdbg2　with　the　original　implementation　and　two　cutting-edge　assemblers　on　real-world　datasets.　The　experiment　results　demonstrate　that　optimized　wtdbg2　achieves　maximum　and　average　speedup　of　2.31　and　1.54　respectively.　In　addition,　our　proposed　optimization　reduces　the　memory　usage　of　wtdbg2　by　39.5%　without　affecting　the　correctness.　©　2020,　Springer　Nature　Switzerland　AG.

Keyword：

Large dataset Parallel architectures Memory architecture Chromosomes Cutting tools

Author Community：

[ 1 ] [Dun, Ming]School of Cyber Science and Technology, Beihang University, Beijing; 100191, China
[ 2 ] [Li, Yunchun]School of Cyber Science and Technology, Beihang University, Beijing; 100191, China
[ 3 ] [Li, Yunchun]School of Computer Science and Engineering, Beihang University, Beijing; 100191, China
[ 4 ] [You, Xin]School of Computer Science and Engineering, Beihang University, Beijing; 100191, China
[ 5 ] [Sun, Qingxiao]School of Computer Science and Engineering, Beihang University, Beijing; 100191, China
[ 6 ] [Luan, Zerong]College of Life Sciences and Bioengineering, Beijing University of Technology, Beijing; 100083, China
[ 7 ] [Yang, Hailong]School of Computer Science and Engineering, Beihang University, Beijing; 100191, China
[ 8 ] [Yang, Hailong]State Key Laboratory of Mathematical Engineering and Advanced Computing, Beijing University of Technology, Beijing; 100083, China

Reprint Author's Address：

[yang, hailong]state key laboratory of mathematical engineering and advanced computing, beijing university of technology, beijing; 100083, china;;[yang, hailong]school of computer science and engineering, beihang university, beijing; 100191, china

Email：

hailong.yang@buaa.edu.cn

Show more details

Related Keywords：

An improved k-medoids clustering algorithm
2010，2nd International Conference on Computer and Automation Engineering, ICCAE 2010
Robust human detection with low energy consumption in visual sensor network
2011，2011 7th International Conference on Mobile Ad-hoc and Sensor Networks, MSN 2011
SecureMLDebugger: A privacy-preserving machine learning debugging tool
2020，5th IEEE International Conference on Data Science in Cyberspace, DSC 2020
Research on facial expression recognition algorithm based on convolutional neural network
2019，28th Wireless and Optical Communications Conference, WOCC 2019
Small-sample image classification method of combining prototype and margin learning
2019，2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

Source ：

ISSN： 0302-9743

Year： 2020

Volume： 12452 LNCS

Page： 232-246

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

环境与生命学部生命科学与生物工程学院

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to