OCR 综述

mac2026-02-27 15

OCR 发展趋势

场景文本检测场景文字识别端到端场景文本识别

场景文字检测

方法举例:

基于回归的方法 Gupta et al, CVPR 2016; Tian et al, ECCV 2016;Shi, Bai, et al, ICCV 2017; Liu et al, CVPR 2017;Liao et al, AAAI 2017; Hu et al, ICCV 2017 ...基于分割的方法 Zhong et al, CVPR 2016; Zhou et al, CVPR 2017;Wu et al, ICCV 2017; Dent et al, AAAI 2018;X Li, CVPR 2019; W Wang, et al, CVPR 2019 ...混合方法 (分割+回归) He et al, ICCV 2017; Lyu et al, CVPR 2018;Liao et al, CVPR 2018; Long et al, ECCV 2018;Liu et al, IJCAI 2019 ...

发展趋势:

水平矩形框检测 \(\longrightarrow\) 多方向矩形框 \(\longrightarrow\) 多方向四边形 \(\longrightarrow\) 曲线文本 \(\longrightarrow\) 任意形状

注:

Segmentation based 的方法不容易准确区分相邻或重叠文本Regression based 的方法对长文本不易检测完整 Bounding box regression 方法需要设置合理的 anchor 参数

Anchor & RPN 调参问题:

Anchor free 回归方法举例:

Segmentation based methodsC.He et al, Direct Regression..., ICCV 2017, TIP 2018.Z Zhong et al, An Anchor-Free Region Proposal Network..., IJDAR 2019.Zhi Tian, Chunhua Shen, et al, FCOS, CVPR, 2019.Chenchen Zhu, Yihui He, et al, FSAF, CVPR, 2019.Tao Kong, Fuchun Sun et al, FoveaBox, arXiv 2019.

Why anchor free? 大多数 RPN regression 方法需要设置合理的 anchors 参数 Eg: SSD \(\longrightarrow\) TextBox (AAAI 2017)

Alternative anchor design? Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, DeRPN: Taking a further step toward more general object detection, AAAI 2019.

场景文字识别

场景文字识别方法:

基于 CTC 的方法 P.He et al, AAAI 2016 (DTRN: CNN+RNN+CTC)B.shi et al, TPAMI 2017 (CRNN: CNN+RNN+CTC)F Yin, et al, arXiv 2017 (CNN+CTC)Y Wu, etal, arXiv 2018 (CNN+CTC)Y Liu et al, ECCV 2018 (GAN+CTC)基于 attention 的方法 C Lee et al, CVPR 2016; B shi 二图案例, CVPR 2016X Yang et al, IJCAI 2017Bai et al, CVPR 2018; Liu et al, AAAI 2018Shi et al, TPAMI 2018 (ASTER)Luo et al, PR 2019 (MORAN)

发展趋势:

规则文本 \(\longrightarrow\) 不规则文本识别 CTC \(\longrightarrow\) Attention (1D, 2D) 检测 + 识别 \(\longrightarrow\) 检测识别端到端

Attention or CTC ?

长文本 CTC 好, 短文本 attention 好

Limitation of Attention and CTC

CTC:

Can hardly be directly applied to 2D predictionLarge computation involved for long sequencePerformance degradation for repeat patterns

Attention:

Misalignment problem (attention drift)More memory size required

Why End2End ?

Prevent training errors be accumulater errors can accumulate in a cascade of detection + recognition which may lead to large fraction of garbage predictionsJointly optimization to help improve overall performanceEasier to maintain and adapt to new domain maintaining a cascaded pipeline with data and model dependencied requires substantial engineering effortFaster, Smaller, Stronger

Some new technique to bridge between detector and recognizer

RoI Rotate (多方向 e2e) X Liu, et al, FOTS, CVPR 2018Tailored RoI pooling (保持长宽比重采样) H Li et al. Towards End-to-EndText Spotting in Natural Scenes, arXiv 20190617 (extionsion of "H Li et al ICCV 2017")RoI Masking (任意形状e2e) S Qin, A Bissacco, et al(Google AI), Towards Unconstrained End-to-End Text Spotting, ICCV 2019

最新回复(0)