收录:
摘要:
Hadoop distributed file system (HDFS) has been widely used in various clusters to build large scale and high performance systems. However, it is designed to mainly handle big size files, therefore the performance processing massive small files is relatively low because of huge numbers of small files imposing heavy burden on Namenode of HDFS. Focusing the problem about HDFS when processing small files, an approach to improve I/O performance of small files on HDFS is introduced. Our main idea is to merge small files in the same directory into large one and accordingly build index for each small file to enhance storage efficiency of small files and reduce burden on Namenode caused by metadata. Furthermore, a kind of Cache strategy to improve the reading efficiency of small files on HDFS is presented. Relevant design, data structure and implementation are described. The experimental results indicate that the method proposed can improve the efficiency of processing massive small files on HDFS.
关键词:
通讯作者信息:
电子邮件地址: