英文标题
The Research of ETL and Annotation Model Construction of Derwent Patent Information
英文摘要
The quality and processing efficiency of patent datasets is the basis of patent analysis and knowledge discovery. At the aim of constructing a processing model for the generation of patent datasets with high equality,our research is based on the platform of SQL Server BI, we develop the information cleaning ( ETL) and annotating model for Derwent patent information( DII) . We use patent information with text form as the data source,after extracting the content of every field,construct different information cleaning strategy based on func-tion expressions, regular expressions and cycle rules to deal with the unique problems of different fields annotate the data which has been cleaned with SQL for the transformation of the rational data to the semantic data. The experiment shows that our model can give a good re-sult for cleaning, annotating and normally storing of the patent information.
翻译关键词
patent information
extracting strategy
DII
data cleaning( ETL)
获取号
WF:perioarticalqbzz201308029