At ICDAR 2023, the top event in the global text recognition (OCR) field, Tencent's OCR team won four championships based on self-developed algorithms. This is the first time it has participated in four consecutive conferences since 2017, 2019, and 2021. Excellent results, winning a total of 18 official certification championships, demonstrating the world's first-class level of Tencent's OCR technology.
The ICDAR Conference is a recognized authoritative academic conference in the field of global document image analysis and recognition. It is held every two years. The event has attracted nearly 8,000 teams from more than 100 countries to participate. The ICDAR competition is well-known at home and abroad for its extremely high technical difficulty and strong practicality. Unlike informal rankings after the game, the official competition officially certified by ICDAR uses a new data set, and the information and information of the participating teams are not announced during the competition. Results, while limiting the time and number of results submissions, are highly difficult "blind typing".
This year, the Tencent OCR team is jointly composed of Tencent Data Platform Department and WeChat Technology Architecture Department, focusing on the two major projects of DSText (Dense Small Text Video Text Recognition) and SVRD (Structured Information Extraction) competition and won 4 track championships.
DSText (Dense Small Text Video Text Recognition) competition set up two tasks: video text tracking and video text end-to-end recognition. Because the text is very dense and very small, coupled with environmental interference (camera shake, motion blur, lighting changes, etc.) and post-editing (multi-lens screen cuts, artificial backgrounds, game interface switching, etc.), it is difficult to accurately detect and track from video frames , Recognizing text requires high robustness of the algorithm and is extremely challenging. Part of the competition video frames are shown below:
ICDAR-DSText competition schematic frames
In all two tasks of the DSText competition, the Tencent OCR team performed absolutely Leading the way to the championship.
Among them, task 1 aims to track all text streams in the video and aggregate the detection frames belonging to the same text instance between video frames. The evaluation indicator is MOTA, and Tencent leads the second place with Winning the championship with a score of 12.04%.
Video Text Tracking: Champion Certificate
In Task 2, designed to evaluate the end-to-end performance of video text recognition, the task requires correctly detecting text on every frame, in It was correctly tracked on the video frame and correctly identified at the sequence level. The evaluation indicator was OCR-MOTA. Tencent won the championship with leading the second place by 11.93%.
End-to-end video text recognition: Championship certificate
SVRD( Structured Information Extraction) competition includes two major tracks, HUST-CELL and BAIDU-FEST, with a total of 4 tasks: complex document entity relationship extraction (E2E Complex Entity Linking), complex document entity semantic extraction (E2E Complex Entity Labeling), and zero-shot structuring Information extraction (E2E Zero-shot Structured Text Extraction) and small sample structured information extraction (Few-shot Structured Text Extraction). Due to the complex layout and diverse structure of document images, irregular collection of natural scene images, complex backgrounds, breakage, bending, deformation and other problems, the competition is quite challenging. Some competition pictures are shown below:
ICDAR-SVRD Structured Information Extraction Competition Example
Tencent OCR team won 2 championships in the SVRD competition. .
Among them, Task 2 (E2E Complex Entity Labeling) aims to extract semantic entities on complex document images, such as titles, organization names, dates, amounts, numbers, product names, people’s names, etc. Tencent is on this task Won the championship with a large advantage
E2E Complex Entity Labeling: Champion Certificate
Task 4 (E2E Few-shot Structured Text Extraction) questions need to be in Under the premise of providing a very small amount of training data, the key information of images in 10 different scenarios is extracted, such as bank cards, business licenses, taxi invoices, shopping receipts, transportation invoices, quota invoices, papers, etc. Tencent also won the championship.
E2E Few-shot Structured Text Extraction: Champion Certificate
According to reports, the Tencent OCR team is a professional team within Tencent dedicated to researching and developing OCR technology. , the team independently developed high-precision and high-stability text detection and recognition technology. In terms of application, it supports hundreds of business scenarios within Tencent, such as Tencent Advertising, WeChat, QQ, Tencent Cloud, Tencent Video, Tencent information flow products.
The above is the detailed content of Tencent OCR team wins four championships in ICDAR competition. For more information, please follow other related articles on the PHP Chinese website!