SATD: syntax-aware handwritten mathematical expression recognition based on tree-structured transformer decoder - Details

Author：

Fu, Pengbin (Fu, Pengbin.) | Xiao, Ganyun (Xiao, Ganyun.) | Yang, Huirong (Yang, Huirong.)

Indexed by：

Scopus SCIE

Abstract：

The　complex　two-dimensional　structure　poses　huge　challenges　for　handwritten　mathematical　expression　recognition　(HMER).　Many　researchers　process　the　LaTeX　sequence　into　a　tree　structure　and　then　design　tree　decoders　based　on　RNN　to　address　this　issue.　However,　RNNs　have　problems　with　long-term　dependency　due　to　their　structural　characteristics.　Although　Transformers　solve　the　long-term　dependency　problem,　tree　decoders　based　on　Transformers　are　rarely　used　for　HMER　because　the　attention　coverage　is　significantly　insufficient　when　the　distance　between　parent　and　child　nodes　is　large　in　tree　structures.　In　this　paper,　we　propose　a　novel　offline　HMER　model　SATD　incorporating　a　tree　decoder　based　on　Transformer　to　learn　the　implicit　structural　relationships　in　LaTeX　strings.　Moreover,　to　address　the　issue　of　distant　parent-child　nodes,　we　introduce　a　multi-scale　attention　aggregation　module　to　refine　attention　weights　using　contextual　information　with　different　receptive　fields.　Experiments　on　CROHME2014/2016/2019　and　HME100K　datasets　demonstrate　performance　improvements,　achieving　accuracy　rates　of　63.45%/60.42%/61.05%　on　the　CROHME　2014/2016/2019　test　sets.　The　source　code　https://github.com/EnderXiao/SATD/　of　this　work　will　be　publicly　available.

Keyword：

Offline handwritten mathematical expression recognition Attention Coverage attention Tree decoder Transformer

Author Community：

[ 1 ] [Fu, Pengbin]Beijing Univ Technol, Fac Informat Technol, Xidawang Rd, Beijing 100124, Peoples R China
[ 2 ] [Xiao, Ganyun]Beijing Univ Technol, Fac Informat Technol, Xidawang Rd, Beijing 100124, Peoples R China
[ 3 ] [Yang, Huirong]Beijing Univ Technol, Fac Informat Technol, Xidawang Rd, Beijing 100124, Peoples R China

Reprint Author's Address：

[Yang, Huirong]Beijing Univ Technol, Fac Informat Technol, Xidawang Rd, Beijing 100124, Peoples R China;;

Email：

fupengbin@bjut.edu.cn |
ender@emails.bjut.edu.cn |
yanghuirong@bjut.edu.cn

Show more details

Related Keywords：

DSWA: A Dilated Shift Window Attention Method for Chinese Named Entity Recognition
2023，29th International DMS Conference on Visualization and Visual Languages, DMSVIVA 2023
LLET: LIGHTWEIGHT LEXICON-ENHANCED TRANSFORMER FOR CHINESE NER
2024，49th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024
LAS-Transformer: An Enhanced Transformer Based on the Local Attention Mechanism for Speech Recognition
2022，Information (Switzerland)
MVT: Chinese NER Using Multi-View Transformer
2024，IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

Source ：

VISUAL COMPUTER

ISSN： 0178-2789

Year： 2024

3 . 5 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 3

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to