OCR and Data Extraction SDK
Overview
Aquaforest SDK is a powerful toolset for processing PDFs including:
- PDF content extraction
- Searchable PDF Creation
- OCR with Standard (Aquaforest) Engine
- OCR with Extended (Canon IRIS) Engine
- Handwriting OCR options via Google & Microsoft APIs
- Advanced PDF and Barcode Toolkit
- High Performance with Support for up to 64 Cores
Main Features
The SDK is able to analyse PDF documents and automatically extract name/value pairs.
The SDK has a wide variety of PDF manipulation capabilities including PDF merging, PDF attachment processing, PDF content extraction, XMP metadata processing, PDF/A validation and more.
The Standard OCR Engine supports 23 languages (see the full list) and is included in every edition of the SDK.
The Extended OCR Engine supports over 100 languages (see the full list) and is included in the Extended Edition licenses.
This provides an interface to Google and Microsoft’s cloud OCR services which can be especially useful for special cases such as handwriting recognition.
The SDK is able to read and recognize most standard barcode types.
Get a Quote
Please contact the sales team for pricing information.
Licensing
License Comparison Table
Edition Comparison | Standard | Extended |
---|---|---|
PDF Toolkit | ||
Data Extraction from PDF documents without the need for templates or prior training | ||
Barcode Decoding | ||
OCR from bitmap, TIFF and PDF | ||
Microsoft Cloud OCR (requires additional Microsoft Subscription) | ||
Google Cloud OCR (requires additional Google Subscription) | ||
Image Pre-Processing and Auto-Rotation | ||
.NET Programmatic and Zonal access to OCR results | ||
RTF and TXT output | ||
Blank Page Removal | ||
PDF Merging | ||
Searchable PDF Output | ||
Stamps on PDF Output | ||
Advanced MRC and JBIG2 Compressed PDF Output | ||
Advanced Pre-processing (Optimized OCR) | ||
Aquaforest OCR Support for 23 languages | ||
Extended IRIS OCR with Support for 131 languages | ||
Support for multiple languages within a single document from the same character set | ||
Multiple document output formats: PDF, DOCX, WORDML, RTF, CSV, XLSX, EXCELML, TXT, HTML and XPS |
||
Multiple PDF version output support | ||
Confidence score support | ||
Asian Language Support | ||
Arabic Language Support | ||
Hebrew Language Support | ||
Intelligent High Quality Compression |
FAQ
What languages are supported?
The Standard bundle includes support for 23 languages.
The Extended bundle includes support for 131 languages.
The Extended bundle language list includes Chinese (Traditional and Simplified), Japanese, Korean, Thai and Vietnamese.
Can I just get a Demo?
We can demonstrate the product for you and discuss how it can meet your needs.
Do you offer any free advice?
How can I contact you?
Email
We aim to respond to email support requests within 1/2 a business day- usually we respond much more quickly than that. Email support@aquaforest.com with any support query.
Phone support
If you prefer to speak directly with our team call us on +44 (0)1296 768 727 or ask for a call via support@aquaforest.com with any support query.
Live chat
You can always contact us on live chat during office hours.
Tech Spec
Searchable PDFs
Aquaforest’s OCR engine, capable of processing thousands of pages per hour, is used to recognise text from source TIFF and Image-Only PDF files and to create Searchable PDF files.
PDF Data Extraction
The Aquaforest Data Extractor allows data extraction from PDF documents without the need for templates or prior training. The software is able to read the PDF text and extract important key-value pairs automatically, making processing of files with various layouts easy.
Image Preprocessing
For optimal OCR recognition, options are available to control deskew, despeckle, graphics area treatment and auto-rotate.
Simple .NET integration
The SDK has been designed to be simple to integrate with .NET applications and complete samples are provided in C#, VB.NET and ASP.NET.
Fully Searchable PDF Generation
The SDK can be used to generate fully text searchable PDFs with the original image and a transparent text layer.
System requirements
Supported Operating Systems |
Windows 10 Windows Server 2012 R2 Windows Server 2016 Windows Server 2019 |
Minimum Memory | Single Core License - 4 GB RAM |
Recommended Memory |
Single Core License - 8 GB RAM 8 Core License - 16 GB RAM Greater Than 8 Core License - Ask support@aquaforest.com |
Recommended CPU |
Single Core License - i5 processor 8 Core License - i7 processor Greater Than 8 Core License - Ask support@aquaforest.com |
Disk Space |
Clean Install: 1.31GB All Samples Compiled: 4.75GB |
.NET Framework | 4.7.2 |
Visual C++ Runtime |
The Visual C++ Redistributable package is required for deployment as well as development. The Aquaforest engine requires Visual C++ 2017 Redistributable (x86 | x64) |
Autobahn DX
Start using Autobahn DX today and convert your archives to fully text searchable PDF today.