收录:
摘要:
The reconstruction of cross-cut shredded text documents (RCCSTD) is an important problem in forensics and is a real, complex and notable issue for information security and judicial investigations. It can be considered a special kind of greedy square jigsaw puzzle and has attracted the attention of many researchers. Clustering fragments into several rows is a crucial and difficult step in RCCSTD. However, existing approaches achieve low clustering accuracy. This paper therefore proposes a new clustering algorithm based on horizontal projection and a constrained seed K-means algorithm to improve the clustering accuracy. The constrained seed K-means algorithm draws upon expert knowledge and has the following characteristics: 1) the first fragment in each row is easy to distinguish and the unidimensional signals that are extracted from the first fragment can be used as the initial clustering center: 2) two or more prior fragments cannot be clustered together. To improve the splicing accuracy in the rows, a penalty coefficient is added to a traditional cost function. Experiments were carried out on 10 text documents. The accuracy of the clustering algorithm was 99.1% and the overall splicing accuracy was 91.0%, according to our measurements. The algorithm was compared with two other approaches and was found to offer significantly improved performance in terms of clustering accuracy. Our approach obtained the best results of RCCSTD problem based on our experiment results. Moreover, a more complex and real problem - reconstruction of cross-cut shredded dual text documents (RCCSDTD) problem - was tried to solve. The satisfactory results for RCCSDTD problems in some cases were obtained, to authors' best knowledge, our method is the first feasible approach for RCCSDTD problem. On the other hand, the developed system is fundamentally an expert system that is being specifically applied to solve RCCSTD problems. (C) 2019 Elsevier Ltd. All rights reserved.
关键词:
通讯作者信息:
电子邮件地址:
来源 :
EXPERT SYSTEMS WITH APPLICATIONS
ISSN: 0957-4174
年份: 2019
卷: 127
页码: 35-46
8 . 5 0 0
JCR@2022
ESI学科: ENGINEERING;
ESI高被引阀值:52
JCR分区:1