Form design also important for OCR accuracy

Today, Feb 2001

Following simple design rules can maximize the performance of OCR and ICR in image processing.

When most companies implement a forms processing solution, their primary driving force is labor-reduction; data captured automatically, without the need for human intervention, equals money saved. Automation is achieved by using optical character recognition (OCR) and intelligent character recognition (ICR), which reads the machine print or handprint from a document image. Better OCR/ ICR means more data captured automatically and greater cost savings.

Form Design

By designing forms according to a few simple rules OCR/ICR performance can be maximized. Furthermore, by designing a form for easier data capture, it will actually be easier for humans to use as well. For example:

Make it easy for the user to fill out and make sure the instructions are clear and simple - this will reduce the number of crossings out, etc.

Clearly define the data fields to encourage answers that are correctly formatted (e.g. indicate mm/ dd/yyyy for dates). OCR/ICR can then use this to interpret the characters correctly.

Use as few methods as possible for collecting the information. Different methods include multiple choice questions, yes/no questions, constrained answers and unconstrained answers.

By combining the above with the use of drop-out colors for character boxes and clearly printed registration marks for de-skew, up to 95% of hand-printed characters and 99% of machine-printed characters can be read from a form, reducing labor costs proportionally. Of course, none of this is possible with legacy forms, so performance rates are naturally lower. But with today's advanced image processing technology it is often possible to achieve sufficient cost reduction to make a business case.

Balancing Savings Against Integrity

The performance of OCR/ICR cannot be stated using a single figure since there is always a difference between the percentages of characters that the OCR/ICR thinks it read correctly (i.e. with high confidence) and the actual number it read correctly.

Characters read with low confidence are known errors and can be dealt with by passing them to a human for confirmation or editing. Characters read incorrectly with high confidence are obviously more critical since there may be no way of knowing they are present in the data.

Increasing character confidence levels can reduce the number of known errors, but this increases the amount of human confirmation needed. So in any forms processing system there needs to be a trade-off between data integrity and labor reduction. However, it is possible to reduce the amount of erroneous data in a particular data field to nearly zero by using data validation (e.g. using checksum routines for credit card numbers, checking customer numbers against databases, postcode/address look-up, etc.). Validation can be particularly powerful when the comparison is fuzzy (i.e. there is some degree of tolerance allowed in the match).

The Final Piece of the Puzzle

To complete the forms processing operation, all data not read by OCR/ ICR needs to be dealt with in the most efficient manner while still ensurfing maximum data integrity. This can be achieved by using data entry software that embeds the same data validation as the OCR/ICR. For high volume data processing, simple data indexing modules are generally not sufficient, since keyers will simply not be able to attain the required throughput of data. High-speed key-- from-image and key-from-paper applications are most suitable in highvolume situations.

So with good form design and sufficient validation of data all the way down the line, forms processing can deliver accurate data with huge cost savings over manual processes. Errors in forms processing can be avoided through solid planning before implementation and efficient follow-through afterward.

Thanks to Nadene Sayer of Neurascript Ltd. www.neurascript.com.

Copyright Association for Work Process Improvement Feb 2001
Provided by ProQuest Information and Learning Company. All rights Reserved

 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement

Content provided in partnership with ProQuest