Pre-Processing options in Autobahn DX and OCR SDK

This blog has been created to illustrate the effects pre-processing settings can have on a document.  There are a number of pre-processing options available from the GUI which can be used to increase recognition in the OCR process.  The pre-processing settings are only used to better prepare your image before it is passed to the OCR engine in order to achieve the best possible results; the output PDF will still be equal to the source input file.

In addition to those options available from the GUI, you can also specify some pre-processing options in the advance flag.  There are a number of Morphological options that can be applied through the advance flag on binarized images before they are OCR’d, the most common options include those listed below.

The syntax for enabling the Morph settings in the advance flag is: -y + one of the values below i.e. -y c2.2 or -y d2.2 or -y e2.2

d2.2 – 2×2 dilation applied to all black pixel areas, useful for faint prints.

e2.2 – 2×2 erosion applied to all black pixel areas, useful for heavy prints.

c2.2 – closing process that performs a 2×2 dilation followed by a 2×2 erosion with the result that holes and gaps in the characters are filled.

In order to see how the pre-processing settings affect an image, we need to need to analyse the intermediary files which by default are deleted when a file has been OCR’d.  In order to retain the temporary files for analysis, enter -b as well in the advance flag so a typical value in the advance flag will be: -b -y c2.2.  The temporary files are stored in the following location:

  • C:\Users\username\AppData\Local\Temp\AquaforestOcr
  • also accessible by %temp%\AquaforestOcr

Below is an example illustrating the above settings being applied to the original image.

These settings can also be applied in the OCR SDK; below are examples of how these settings can be enabled in C#.

to view the intermediary files in the temp location ensure _ocr.DeleteTemporaryFiles(); is not enabled.

for closing use: _preProcessor.Morph =”c2.2″

for dialtion use: _preProcessor.Morph =”d2.2″

for erotion use: _preProcessor.Morph =”e2.2″

Author

Neil Pitman

Head of IT Business Solutions

Neil established Aquaforest in 2001 to provide high-performance PDF, OCR, and SharePoint products to a worldwide market.

Categories

Archive

Share Post

Related Posts

Autobahn DX allows users to set up and customize workflows with ease and run them automatically. It also works well when processing large volumes…
Our no-code OCR server, Autobahn DX, allows users to set up and customize workflows with ease and run them automatically. It also works well…
Digital archiving of documents has many benefits, such as preventing data loss, reducing operational costs, improving security, and enhancing compliance. It also supports the…