When using the Aquaforest OCR SDK, intermittently you may receive the following message in your application:
System.IO.FileNotFoundException was caught
FileName=C:\WINDOWS\TEMP\AquaforestOcr\xxxx_xx\x_x.hocr
Message=Could not find file 'C:\WINDOWS\TEMP\AquaforestOcr\xxxx_xx\x_x.hocr'
This message is generated as a direct result of the source file not being OCR’d, however the particular message is not appropriate in this case. In order to resolve this issue you need to subscribe to the StatusUpdate which will allow you to use StatusUpdateEventArgs. This class is available for each page processed when subscribing to the StatusUpdate event and provides information relating to the processing outcome for the page.
Properties
Below are the properties of this class.
- int PageNumber This property returns page for which the object relates to.
- int Rotation A value from 0 to 3 which indicates the rotation used for the output in terms of the number of 90° steps away from the orientation in which the input page was provided. If AutoRotation is set to false this will always be 0.
- double ConfidenceScore Generally a value of 1 or greater would indicate that reasonable OCR of a page, but this should be confirmed using “typical” source files.
- bool TextAvailable This property indicates whether text was extracted for the page.
- bool ImageAvailable This property indicates whether an image (after all appropriate pre-processing) was successfully extracted.
- bool BlankPage This property indicates whether the page was detected as blank.
Example
Below is an example in C# where the above class has been used (higlighted in red) to overcome this issue:
class Program
{
static bool textAvailable = false;
static void Main(string[] args)
{
try
{
Ocr _ocr = new
Ocr();
_ocr.License = "";
PreProcessor _preProcessor = new PreProcessor();
_ocr.EnableConsoleOutput = true;
string OCRFiles = System.IO.Path.GetFullPath(@"..\..\..\..\..\..\bin");
System.Environment.SetEnvironmentVariable("PATH", System.Environment.GetEnvironmentVariable("PATH") + ";"
+ OCRFiles);
_ocr.ResourceFolder = OCRFiles;
_preProcessor.Deskew = true;
_preProcessor.Autorotate = false;
_ocr.Language = SupportedLanguages.English;
_ocr.EnablePdfOutput = true;
_ocr.StatusUpdate += OcrStatusUpdate;
_ocr.ReadTIFFSource(System.IO.Path.GetFullPath(@"..\..\..\..\..\..\docs\tiffs\sample.tif"));
if (_ocr.Recognize(_preProcessor))
{
string words = null;
for (int j = 1;
j = _ocr.NumberPages; j++)
{
try
{
if (textAvailable)
words += _ocr.ReadPageString(j);
}
catch (Exception
ex)
{
Console.WriteLine("ERROR");
}
}
_ocr.SavePDFOutput(System.IO.Path.GetFullPath(@"..\..\..\..\..\..\docs\tiffs\sample.pdf"),
true);
}
_ocr.DeleteTemporaryFiles();
}
catch (Exception
e)
{
Console.WriteLine("Error
in OCR Processing :" + e.Message);
}
}
private static void OcrStatusUpdate(object sender,
StatusUpdateEventArgs statusUpdateEventArgs)
{
textAvailable = statusUpdateEventArgs.TextAvailable;
}
}