How It Works
Overview
Features & Roadmap
How It Works
Pricing
Getting Started
Use Cases
Meet the Team
Resources
dev-portal-icon / PRODUCTS / Productivity Tools / READ / How It Works

How It Works

READ currently processes documents in various image and PDF formats, with the flexibility to support more in future. Here is a READ demo showcasing its workflow and features, using a publicly available passport sample as an example.

READ also outperforms commercial solutions in processing real-world documents, delivering greater accuracy and reliability for automated document processing. This is comparison of READ and AWS Textract, highlighting their capability differences in processing tilted, photo-captured documents.

Its backend architecture is built on five core components:

  • Image Preprocessing
  • Optical Character Recognition (OCR)
  • Document and Page Classifier
  • Key Information Extraction (KIE)
  • Post-processing

Image Preprocessing: Enhancing Real-World Submissions

A major challenge in commercial solutions is their reliance on high-quality document inputs, which often do not reflect real-world submissions. Many government agencies receive blurred, tilted, or low-quality scans, leading to suboptimal performance with existing solutions.

To address this, the READ team conducted in-depth research on real-life document characteristics and developed custom preprocessing techniques to improve OCR accuracy. These include normalisation, cropping, deskewing and text enhancement.

OCR: Optimised for Unique Fonts and Complex Documents

READ leverages PaddleOCR as the backbone for OCR tasks, but unlike off-the-shelf solutions, the model is fine-tuned to recognise specialised fonts and unique document structures. For low-quality documents, READ internally runs local OCR multiple times focusing on zoom-in views to capture the details. This is particularly useful for government forms, licences, and handwritten documents, where standard OCR engines struggle to perform.

Document and Page Classifier

READ uses an ensemble model for classifying document and page type classification, ensuring accurate identification and verification to confirm that the correct document type is submitted, and the correct pages are processed thoroughly. 

Key Information Extraction (KIE): Multimodal Understanding

After completing the OCR step, READ applies Key Information Extraction (KIE) using LayoutLM, a deep learning model designed for document intelligence. This enables READ to:

  • Analyse both textual and visual elements such as tables, stamps, and handwritten notes.
  • Fine-tune entity recognition to extract specific document types like receipt numbers, supplier names, or dates.
  • Customise extraction models to meet the unique needs of various government agencies.

Post-processing

READ applies dedicated post-processing to KIE results, including date normalisation, typo correction, and field validation, to enhance accuracy and data consistency.

Was this article useful?

An AI-Powered Solution That Streamlines Document Automation for Faster, More Accurate Workflows