
READ currently processes documents in various image and PDF formats, with the flexibility to support more in future. Here is a READ demo showcasing its workflow and features, using a publicly available passport sample as an example.
READ also outperforms commercial solutions in processing real-world documents, delivering greater accuracy and reliability for automated document processing. This is comparison of READ and AWS Textract, highlighting their capability differences in processing tilted, photo-captured documents.
Its backend architecture is built on five core components:
- Image Preprocessing
- Optical Character Recognition (OCR)
- Document and Page Classifier
- Key Information Extraction (KIE)
- Post-processing
Image Preprocessing: Enhancing Real-World Submissions
A major challenge in commercial solutions is their reliance on high-quality document inputs, which often do not reflect real-world submissions. Many government agencies receive blurred, tilted, or low-quality scans, leading to suboptimal performance with existing solutions.
To address this, the READ team conducted in-depth research on real-life document characteristics and developed custom preprocessing techniques to improve OCR accuracy. These include normalisation, cropping, deskewing and text enhancement.
OCR: Optimised for Unique Fonts and Complex Documents
READ leverages PaddleOCR as the backbone for OCR tasks, but unlike off-the-shelf solutions, the model is fine-tuned to recognise specialised fonts and unique document structures. For low-quality documents, READ internally runs local OCR multiple times focusing on zoom-in views to capture the details. This is particularly useful for government forms, licences, and handwritten documents, where standard OCR engines struggle to perform.
Document and Page Classifier
READ uses an ensemble model for classifying document and page type classification, ensuring accurate identification and verification to confirm that the correct document type is submitted, and the correct pages are processed thoroughly.
Key Information Extraction (KIE): Multimodal Understanding
After completing the OCR step, READ applies Key Information Extraction (KIE) using LayoutLM, a deep learning model designed for document intelligence. This enables READ to:
- Analyse both textual and visual elements such as tables, stamps, and handwritten notes.
- Fine-tune entity recognition to extract specific document types like receipt numbers, supplier names, or dates.
- Customise extraction models to meet the unique needs of various government agencies.
Post-processing
READ applies dedicated post-processing to KIE results, including date normalisation, typo correction, and field validation, to enhance accuracy and data consistency.
Thanks for letting us know that this page is useful for you!
If you've got a moment, please tell us what we did right so that we can do more of it.
Did this page help you? - No
Thanks for letting us know that this page still needs work to be done.
If you've got a moment, please tell us how we can make this page better.

An AI-Powered Solution That Streamlines Document Automation for Faster, More Accurate Workflows