Chapter 4
Training 69
Training
Training is the process of changing the OCR solutions assigned to
character shapes in the image. It is useful for uniformly degraded
documents or when an unusual typeface is used throughout a document.
Training will be less useful for texts with random distortions. Here is an
example, based on the letter “g”, which can be printed in different ways:
The first two examples do not need training, because both shapes are
normal for the letter “g” and the program can handle them. The third
example could benefit from training because the shape of “g” is unusual,
and all instances of “g” in the text are likely to look like this. The fourth
example is not good for training, because the first “g” is poorly printed, and
this shape is unlikely to appear again in the document.
You can use training to improve recognition of special symbols such as @,
® and © or to recognize supported accented letters more reliably. The
purpose of training is not to teach the program to read characters from
non-supported languages or alphabets.
OmniPage Pro 12 offers two types of training: manual training and
automatic training (IntelliTrain). Data coming from both types of
training are combined and available for saving to a training file.
When you leave a page on which training data was generated, you will be
asked how to apply it to other existing pages in the document.
Manual training
To do manual training, place the insertion point in front of the character
you want to train, or select a group of characters (up to one word) and
choose Train Character... from the Tools menu or the shortcut menu. You
will see an enlarged view of the character(s) to be trained, along with the
current OCR solution. Change this to the desired solution and click OK.
The program takes this training and examines the rest of the page. If it