• 综合
  • 标题
  • 关键词
  • 摘要
  • 学者
  • 期刊-刊名
  • 期刊-ISSN
  • 会议名称
搜索

作者:

En, MengYi (En, MengYi.) | Li, Rong (Li, Rong.) | Li, JianQiang (Li, JianQiang.) (学者:李建强) | Liu, Bo (Liu, Bo.) (学者:刘博)

收录:

CPCI-S EI Scopus

摘要:

Features are critical for detecting texts in natural scene images. Nowadays most of scene text detection algorithm leverage powerful feature learning power of convolutional neural networks (CNNs) to learn discriminative features which could distinguish text from non-text well and perform detection based on these features. It is known that features from low layers of CNN are high-resolution but have low discriminative power and less semantic information; this compromises the representative capacity of the features. On the other hand, feature maps from high layers are discriminative but coarse-resolution, which harms the power for detecting small objects. In this paper, we present a feature pyramid based text detector (FPTD) for detecting scene texts at different scales, especially texts at small scales. Our framework is based on the state-of-the-art framework "Single Shot detector" (SSD), but not like SSD which performs detection on feature maps from later-stage of the network, which are coarse in resolution so they cannot get satisfied results on small objects. Our framework incorporates feature pyramid mechanism with SSD framework. Specifically, in the framework, we adopt a top-down fusion strategy to build new features with strong semantics while keep fine details. Text detections are conducted on multiple new constructed features respectively during a single forward pass. All detection results from each layer are gathered and undergo a non-maximum suppression (NMS) process. Since detection is conducted on feature maps from several layers which at different scales but are all discriminative, our framework has strong power to detect texts at different scales. Experimental results confirm that our framework achieves competitive performance on the ICDAR2013 text location benchmark and with marginal extra cost.

关键词:

feature fusion deep learning multi-scale CNN scene text feature pyramid

作者机构:

  • [ 1 ] [En, MengYi]Beijing Univ Technol, Fac Informat, Beijing, Peoples R China
  • [ 2 ] [Li, Rong]Beijing Univ Technol, Fac Informat, Beijing, Peoples R China
  • [ 3 ] [Li, JianQiang]Beijing Univ Technol, Fac Informat, Beijing, Peoples R China
  • [ 4 ] [Liu, Bo]Beijing Univ Technol, Fac Informat, Beijing, Peoples R China

通讯作者信息:

  • [Li, Rong]Beijing Univ Technol, Fac Informat, Beijing, Peoples R China

查看成果更多字段

相关关键词:

来源 :

2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2017), VOL 6

ISSN: 1520-5363

年份: 2017

页码: 3-8

语种: 英文

被引次数:

WoS核心集被引频次: 5

SCOPUS被引频次: 4

ESI高被引论文在榜: 0 展开所有

万方被引频次:

中文被引频次:

近30日浏览量: 0

归属院系:

在线人数/总访问数:437/4965283
地址:北京工业大学图书馆(北京市朝阳区平乐园100号 邮编:100124) 联系我们:010-67392185
版权所有:北京工业大学图书馆 站点建设与维护:北京爱琴海乐之技术有限公司