EVENTSTR: A BENCHMARK DATASET AND BASELINES FOR EVENT STREAM BASED SCENE TEXT RECOGNITION

ANHUI UNIVERSITY, BEIJING INSTITUTE OF TECHNOLOGY, PENG CHENG LABORATORY, PEKING UNIVERSITY

 

Xiao Wang, Jingtao Jiang, Dong Li, Futian Wang, Lin Zhu, Yaowei Wang, Yongyong Tian, Jin Tang

 

ABSTRACT

Mainstream Scene Text Recognition (STR) algorithms are developed based on RGB cameras which are sensitive to challenging factors such as low illumination, motion blur, and cluttered backgrounds. In this paper, we propose to recognize the scene text using bio-inspired event cameras by collecting and annotating a large-scale benchmark dataset, termed EventSTR. It contains 9,928 high-definition (1280 × 720) event samples and involves both Chinese and English characters. We also
benchmark multiple STR algorithms as the baselines for future works to compare. In addition, we propose a new event-based scene text recognition framework, termed SimC-ESTR. It first extracts the event features using a visual encoder and projects them into tokens using a Q-former module. More importantly, we propose to augment the vision tokens based on a memory mechanism before feeding into the large language models.
 

Source: arxiv.org

 

PRODUCTS USED IN THIS PAPER

SEARCH PUBLICATION LIBRARY

Don’t miss a bit,

follow us to be the first to know

✉️ Join Our Newsletter