This site uses cookies to improve customer service and for other purposes.
For more information, please click here.

News

  1. HOME
  2. News
  3. 2022/04/28

Morpho AI Solutions Developed OCR Program for National Diet Library by Using Latest AI Technologies

Info
2022/04/28

Tokyo, Japan – April 28th, 2022– Morpho AI Solutions, Inc., responsible for AI business in the Morpho Group, announced that it has completed the “Research and Development of OCR Processing Program (hereinafter “the Project”)” commissioned by the National Diet Library.

 

As part of “Vision 2021-2025: The Digital Shift at the National Diet Library”, the National Diet Library is working on a project to realize universal access to provide a variety of information resources to all users throughout the future. They are also working on projects to expand the national digital information infrastructure serving permanently for this purpose.

 

“Vision 2021-2025: The Digital Shift at the National Diet Library”: https://vision2021.ndl.go.jp/en/

In this project, Morpho researched and developed an OCR processing program that incorporates Morpho’s latest AI and image processing technologies to enable text data creation for the images of materials available in the Digital Collections of the National Diet Library hereafter. In addition, an OCR training dataset of approximately 13 million characters was constructed with the cooperation of Toppan Inc.

 

The OCR processing program developed in 2021 supports a variety of layouts and character types, enabling text conversion of complex materials from the Meiji to Showa periods, which existing OCR services are unable to handle.

Research and development of text conversion processing of book images (200 million images) from the Meiji to Showa periods

(1) Support for complex layouts

(2) Supports variety of character types (old kanji characters and old phonetic characters)

(3) Improvement of OCR processing program accuracy

Books and magazines from the 1860s onward can now be recognized with more than 90% accuracy, which is much higher than commercial OCR. In particular, for modern books and magazines from the Meiji period to the early Showa period, the reading accuracy doubled compared to commercial OCR (from approximately 40% to over 90%).

Comments from the Research and Development for Next-Generation Systems Office, National Diet Library

“NDLOCR, a Japanese OCR program developed in this Project, was released on April 25, 2022, as open source from the official NDL Lab GitHub account (https://github.com/ndl-lab). NDLOCR is an OCR program that enables additional training from the original training data. It will be used to create full-text data for materials to be digitized by the National Diet Library in the future. In addition to the program, the machine learning dataset used for development will be made available soon. (Note: only for the portion created from digitized materials whose copyright protection period has expired) We hope that NDLOCR will contribute to improving the accuracy of Japanese OCR overall, and we hope that many interested parties will benefit from it.”

About Morpho AI Solutions, Inc.

Morpho AI Solutions is a company engaged in the commercialization of AI (Artificial Intelligence). We promote the introduction and actual operation of cutting-edge AI technologies, including AI-OCR, in the areas of social infrastructure such as government, electric power, transportation, and manufacturing.

For more information, visit https://www.morphoai.com.