收录:
摘要:
Due to the unlimited amount of information available on the Web, it is a burdensome task for users to navigate the ever-increasing Internet. Therefore, Web search engines become necessary tools, supporting information searching and retrieval. However, it is a challenge to build all efficient system to search around the highly dynamic World Wide Web (WWW). The aim of this paper is to develop a high-performance Web search system. In this paper. we propose the architecture of a distributed and multi-threaded search system, and develop this system in Java on IBM ABLE platform. With the assistance of various techniques concerning crawling, XML, and database, the paper makes special emphasis oil the design and implementation of Web crawling model and pages extraction model.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INNOVATION & MANAGEMENT, VOLS I AND II
年份: 2007
页码: 1837-1842
语种: 英文
归属院系: