• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Li Rong (Li Rong.) | En Mengyi (En Mengyi.) | Li Jianqiang (Li Jianqiang.) (Scholars:李建强) | Zhang haibin (Zhang haibin.) (Scholars:张海斌)

Indexed by:

CPCI-S EI Scopus

Abstract:

Detection and recognition of textual information in scene images is useful but challenging tasks. Numerous methods have been proposed to solve the problem. Recently the best results are attained by deep neural network based methods. Training such networks needs large amounts of bounding box-level or pixel-level annotated data. Generating large amounts of such data always requires huge amounts of labor which can be expensive and time consuming. In this paper we explore the utilization of weakly supervised deep neural network for generating text proposals in natural scene images. The network allows multi-scale inputs and is trained to perform whole image binary classification to tell whether an image contains text or not. After training the network acquired learning of powerful discriminated features that are capable of distinguishing text from other objects. To get the text location, text confidence score map is generated based on feature maps from the top two convolutional layers by extracting class activation map. Value of each pixel in the score map denotes the confidence score of whether the pixel belongs to text or not. By setting a threshold the score map is converted to a binary mask map. Foregrounds of the mask map are probable text areas. Then Maximally Stable Extremal Regions (MSERs) are extracted from these probable text areas and are aggregated as groups. By processing these groups, text proposals are obtained. Experimental results show that without using any bounding boxes or pixel-level annotation, the algorithm achieves recall rate comparable to some fully supervised methods in ICDAR 2013 focused text dataset and In ICDAR 2015 incidental text dataset.

Keyword:

weak supervision object proposals convolutional neural network scene text spatial pyramid pooling

Author Community:

  • [ 1 ] [Li Rong]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
  • [ 2 ] [En Mengyi]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
  • [ 3 ] [Li Jianqiang]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
  • [ 4 ] [Zhang haibin]Beijing Univ Technol, Coll Appl Sci, Beijing, Peoples R China

Reprint Author's Address:

  • [Li Rong]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China

Show more details

Related Keywords:

Related Article:

Source :

2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1

ISSN: 1520-5363

Year: 2017

Page: 324-330

Language: English

Cited Count:

WoS CC Cited Count: 5

SCOPUS Cited Count: 8

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 0

Affiliated Colleges:

Online/Total:353/6528507
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.