A super powerful OCR tool! It's called Surya, an open-source OCR tool with explosive performance, updated with table recognition capabilities. It can not only recognize the rows, columns, and cells of tables but also recognize rotated tables and complex layouts, supporting over 90 languages—it's simply unbeatable. Surya outperforms the current SoTA open-source model Table Transformer, especially in table recognition, thanks to its advanced architecture. Currently, it has over 10,000 stars on GitHub, is completely free and open-source, and can be applied in commercial scenarios.
Core Features
-
Table Recognition: The new version of Surya has significantly enhanced table recognition, clearly identifying the rows, columns, and cells within tables, while also recognizing specific character content. This feature is undoubtedly a boon for those who need to handle large amounts of table data.
-
Complex Layout Recognition: It is not limited to tables; it can also recognize complex layouts within documents, such as titles, images, and even rotated tables. This means that no matter how complex your document is, Surya can accurately extract the information you need.
-
Support for Over 90 Languages: It supports OCR recognition for over 90 languages worldwide, including Chinese, Japanese, Korean, Arabic, and more. This multilingual support allows it to easily handle documents in various languages, whether for international business document processing or content conversion for localization projects, Surya can handle it with ease.
-
Efficient Text Recognition and Reading Order Determination: In addition to tables, Surya excels at line-level text detection and can correctly identify the reading order of text, avoiding confusion in document information and ensuring that text content is output in the correct sequence.
-
Local Operation and API Support: Another highlight of Surya is its ability to run locally, allowing developers to process sensitive information offline or handle documents on a large scale. Additionally, Surya provides an API interface, making it easy for developers to integrate it into their applications for batch automation processing.
GitHub Address: https://github.com/VikParuchuri/surya