February 23, 2017

The Biggest Challenges of Content Findability

This post has been written by Aquaforest Guest Blogger, Agnes Molnar, Founder and Managing Consultant, Search Explained.

Enterprise Search is intended to help users finding relevant and valuable content to get their jobs done.

However, when focusing on the key expression of this intent, “relevant and valuable content” – we instantly have to face the first challenge: how “content value” can be defined?

Content value

Basically, there are three factors that make content valuable:

Timeliness – In most cases, users want to get the latest version of documents (unless they explicitly specify otherwise). Usually, the older a document is, the less value it has.
Some examples where timeliness is critical: policies, procedure descriptions, document templates, manuals, etc. In other cases, finding the old content has business value. For example, legal documents, invoices, archives, etc.
Accuracy – The content is accurate if it is carefully prepared, precise, exact, and consistent with the company’s standards and rules. We can improve accuracy by not only improving the content but also by adding more, accurate metadata.
Completeness – Having all the needed information, and not missing any relevant data make a document complete. A document being complete means it has integrity.
From findability perspective, the more of the content is available to read, crawl and index, the more complete the document can be. It is evident that we should make the whole content readable to support better findability. Also, readability of the content makes better accuracy possible, too, therefore we can increase the benefits even more.

Improving content value for better findability

After defining these characteristics, the next question is how to enhance content value to support better findability.

To make it easier to find the latest and timely content, we have to make sure outdated, old, legacy and archive content is either removed from the search index (if they are absolutely unnecessary), or filtered out from the default user interface and only available on a separate page.

To make the content more accurate, there are three things to do. First of all, we have to make sure its quality is good and valuable for (human) subject matter experts. Second, we have to make the content complete. Last but not least, we have to add as much relevant metadata as possible.

In many cases, completeness is the weakest point of content quality, especially when the document is a scanned, non-OCR’d picture or PDF file. In these cases, the content is there, readable and consumable for humans, but doesn’t contribute to the content’s value as it is not readable for the content processing engine. To improve completeness, the first thing to do is making sure the content of the document is machine-readable.

The key is: OCR

Making the content machine-readable is the key to having better content completeness as well as accuracy. This can be done by processing them with OCR technology (Optical Character Recognition), to create a text version of the file contents. This enables the file to be searched and found.

Besides the obvious and immediate benefits of OCR-ing these documents, it has a common side-effect, too, namely that after making the content machine-readable, more metadata can be extracted and created automatically. Therefore, the accuracy of the document will be boosted, too.

Don’t forget: search engines can read the text only. They cannot understand images or the content creator’s intent. Therefore, creating and generating as much textual information as possible is essential for getting these contents processed, and making them findable. OCR technologies are here to help.

[If you want to learn more about the related technologies and available tools, please check Aquaforest SearchLight.]

Author

Neil Pitman

Head of IT Business Solutions

Neil established Aquaforest in 2001 to provide high-performance PDF, OCR, and SharePoint products to a worldwide market.

Challenges Finding PDFs in SharePoint or Office 365

February 16, 2024

Ensure Your Documents are Fully Text Searchable with Aquaforest Searchlight Why Can’t I Find That PDF? So you have just spent half an hour…

Aquaforest Searchlight

Understanding Your Document Landscape: Best Document Management Practices for SharePoint and Microsoft 365

November 8, 2023

Document discovery is a fundamental process that lays the foundation for effective document auditing and content organization. It provides organizations with a comprehensive understanding…