收录:
摘要:
In this paper, aiming at overcoming the shortcomings of traditional RSJ (Reduce Side Join) algorithm based on MapReduce framework model, an optimized algorithm is proposed to increase the efficiency of the RSJ by using DistributeCache. The idea of Bit-map algorithm is adopted in this algorithm. Specifically, it extracts and compresses the connection attributes of one of the tables to make a "background" data before executing the traditional RSJ algorithm. The algorithm uses DistributeCache to spread it to various nodes. The "background" data is able to filter out the information which is independent from the records in the table, therefore both the network comnmnication and computation cost are decreased, which increases the efficiency of RSJ.
关键词:
通讯作者信息:
电子邮件地址: