In this article, we will highlight key features of each OCR engine to help you decide on which OCR engine to use for your project.
Language Support:
The Aquaforest OCR supports 23 languages (primarily European languages) see this document for a list of the supported languages.
The Extended (IRIS) OCR engine supports over 127 languages including support for Asian languages, see this document to see the full list of supported languages.
The Extended (IRIS) OCR engine allows specification of multiple languages to enable recognition of several languages in a single document, the languages must be from the same character set.
Output File Formats:
The Aquaforest OCR engine can generate the following output file formats:
- TXT
- RTF
The Extended (IRIS) OCR engine can generate the following output file formats:
- TXT
- RTF
- DOCX
- EXCELML
- HTML
- CSV
- XPS
- XLSX
Compression:
The Aquaforest OCR engine includes JBIG2 compression for black and white images and MRC for color images.
The Extended (IRIS) OCR engine has the IHQC Module which is an optional module. It enables the use of IRIS’ new Intelligent High Quality Compression technology for powerful PDF compression without compromising visual quality, text resolution and legibility of documents.
Pre-Processing Options:
These options such as de-skewing images, auto-rotate can be applied to the image to ensure optimal OCR performance.
The Aquaforest OCR engine provides the following pre-processing options:
Auto-rotate: rotates the image if required.
Line removal: removes line from the image.
De-skew: straightens image.
De-speckle: removes specks from image.
Binarize: whether to perform binarization on color images.
The Extended (IRIS) OCR engine provides a more comprehensive set of pre-processing options which are listed below:
De-speckle: removes specks from image.
Auto-rotate: rotates the image if required.
Line removal remove lines from an image (The image must be black and white).
RemoveWhitePixels By default, de-speckle removes black pixels. If set to true, the de-speckle will remove white pixels rather than black pixels.
Binarization Whether or not to perform binarization on the document.
Brightness The brightness (higher values will darker the result).
Contrast The contrast (lower values will darker the result).
SmoothingLevel Smoothing may be useful to binarize text with a colored background in order to avoid noisy pixels (0 disables smoothing, higher values smooth more).
Threshold Sets the threshold for fixed threshold binarization (0 for automatic threshold computation).
HorizontalCleanX The parameter for cleaning noisy pixels attached to the horizontal lines.
HorizontalCleanY The parameter for cleaning noisy pixels attached to the horizontal lines.
VerticalCleanX The parameter for cleaning noisy pixels attached to the vertical lines.
VerticalCleanY The parameter for cleaning noisy pixels attached to the vertical lines.
HorizontalDilate The dilate parameter helps the detection of horizontal lines.
VerticalDilate The dilate parameter that helps the detection of vertical lines.
HorizontalMaxGap The maximum horizontal line gap to close. It is useful to remove broken lines.
VerticalMaxGap The maximum vertical line gap to close. It is useful to remove broken lines.
HorizontalMaxThickness The maximum thickness of the horizontal lines to remove. It is useful to keep vertical lines larger than this parameter. Can be also useful to keep vertical letter strokes.
VerticalMaxThickness The maximum thickness of the vertical lines to remove. It is useful to keep horizontal lines larger than this parameter. Can be also useful to keep horizontal letter strokes.
HorizontalMinLength The minimum length of the horizontal lines to remove.
VerticalMinLength The minimum length of the vertical lines to remove.
RemoveDarkBorders Removes the dark surrounding from bitonal, grayscale or color images. The dark surrounding of the image is whitened (Note: The dark border should be touching the edge of the page for this to work).
Interpolation Interpolates the source image to the given resolution. This value (the target resolution) must be greater than the source image’s resolution.
InterpolationMode Sets the interpolation mode.
KeepOriginalImage Keep the original image as it is.
If you have any questions please send an email to the support team who will be happy to assist you with your query.