weakly supervised text attention network for generating text proposals in scene images - Details

Author：

Li Rong (Li Rong.) | En Mengyi (En Mengyi.) | Li Jianqiang (Li Jianqiang.) (Scholars：李建强) | Zhang haibin (Zhang haibin.) (Scholars：张海斌)

Indexed by：

CPCI-S EI Scopus

Abstract：

Detection　and　recognition　of　textual　information　in　scene　images　is　useful　but　challenging　tasks.　Numerous　methods　have　been　proposed　to　solve　the　problem.　Recently　the　best　results　are　attained　by　deep　neural　network　based　methods.　Training　such　networks　needs　large　amounts　of　bounding　box-level　or　pixel-level　annotated　data.　Generating　large　amounts　of　such　data　always　requires　huge　amounts　of　labor　which　can　be　expensive　and　time　consuming.　In　this　paper　we　explore　the　utilization　of　weakly　supervised　deep　neural　network　for　generating　text　proposals　in　natural　scene　images.　The　network　allows　multi-scale　inputs　and　is　trained　to　perform　whole　image　binary　classification　to　tell　whether　an　image　contains　text　or　not.　After　training　the　network　acquired　learning　of　powerful　discriminated　features　that　are　capable　of　distinguishing　text　from　other　objects.　To　get　the　text　location,　text　confidence　score　map　is　generated　based　on　feature　maps　from　the　top　two　convolutional　layers　by　extracting　class　activation　map.　Value　of　each　pixel　in　the　score　map　denotes　the　confidence　score　of　whether　the　pixel　belongs　to　text　or　not.　By　setting　a　threshold　the　score　map　is　converted　to　a　binary　mask　map.　Foregrounds　of　the　mask　map　are　probable　text　areas.　Then　Maximally　Stable　Extremal　Regions　(MSERs)　are　extracted　from　these　probable　text　areas　and　are　aggregated　as　groups.　By　processing　these　groups,　text　proposals　are　obtained.　Experimental　results　show　that　without　using　any　bounding　boxes　or　pixel-level　annotation,　the　algorithm　achieves　recall　rate　comparable　to　some　fully　supervised　methods　in　ICDAR　2013　focused　text　dataset　and　In　ICDAR　2015　incidental　text　dataset.

Keyword：

weak supervision object proposals convolutional neural network scene text spatial pyramid pooling

Author Community：

[ 1 ] [Li Rong]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
[ 2 ] [En Mengyi]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
[ 3 ] [Li Jianqiang]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
[ 4 ] [Zhang haibin]Beijing Univ Technol, Coll Appl Sci, Beijing, Peoples R China

Reprint Author's Address：

[Li Rong]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China

Email：

leerong@bjut.edu.cn |
enmengyi@qq.com |
lijianiang@bjut.edu.cn |
zhanghaibin@bjut.edu.cn

Show more details

Related Keywords：

Subject Independent Facial Expression Recognition: Cross-Connection and Spatial Pyramid Pooling Convolutional Neural Network
2019，International Conference on Image, Video and Signal Processing (IVSP)
A Narrow-domain Entity Recognition Method Based on Domain Relevance Measurement and Context Information
2017，IEEE/WIC/ACM International Conference on Web Intelligence (WI)
Weakly supervised training for eye fundus lesion segmentation in patients with diabetic retinopathy
2022，MATHEMATICAL BIOSCIENCES AND ENGINEERING
Video-based face recognition based on deep convolutional neural network
2019，International Conference on Image, Video and Signal Processing (IVSP)

Source ：

2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1

ISSN： 1520-5363

Year： 2017

Page： 324-330

Language： English

Cited Count：

WoS CC Cited Count： 5

SCOPUS Cited Count： 8

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

信息学部

理学部

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to