Creating an OCR Language

When performing OCR on a document, ABBYY FineReader uses some information about the language of the document (this language should be selected from the Document Languages drop-down list in the Document window). If there are too many unusual abbreviations or words in the text, the program may fail to recognize them correctly. If this is the case, you may wish to create your own recognition language for this document.

  1. From the Tools menu, select Language Editor….
  2. In the Language Editor dialog box, click New….
  3. In the New Language or Group dialog box, select Create a new language based on an existing one and from the drop-down list below, select the desired language. Click OK.
  4. In the Language Properties dialog box, specify the properties of the new OCR language.
    1. Language name — Type a name for your OCR language in this field.
    2. Source language — The language on which your new OCR language will be based. (Displays the language you selected in the New Language or Group dialog box. Click the arrow to the right to select a different language.)
    3. Alphabet — Lists the characters of the alphabet of the source language. Click Finereader dotbutton Creating an OCR Language to add or remove characters.
    4. Dictionary — The dictionary that ABBYY FineReader will use to perform OCR on your document and to check the recognized text. The following options are available:

      • None
        No dictionary will be used.
      • Built-in dictionary
        The dictionary supplied with ABBYY FineReader will be used.
      • User dictionary
        A user dictionary will be used. Click the Edit… button to add words to the dictionary or to import an existing user dictionary or text file in Windows (ANSI) or Unicode encoding. The words in the text file you wish to import must be separated by spaces or other non-alphabetic characters.

        Note. The words from the user dictionary may occur in the recognized text in the following capitalizations: 1) lowercase only, 2) uppercase only, 3) first letter capitalized, 4) as spelt in the user dictionary. The four possibilities are summed up in the table below.

        Word as spelt in the user dictionary Possible occurrences of the word in the text
        abc abc, Abc, ABC
        Abc abc, Abc, ABC
        ABC abc, Abc, ABC
        aBc aBc, abc, Abc, ABC
      • Regular expression
        You can use a regular expression to create new language.
        Finereader listpicture Creating an OCR Language For details, see Regular Expressions.
    5. Advanced… — Opens the Advanced Language Properties dialog box, where you can specify more advanced properties for your language:

      • Non-letter characters that may occur at the beginning or at the end of words
      • Standalone non-letter characters (punctuation marks, etc.)
      • Characters to be ignored if they occur inside words
      • Prohibited characters that may never occur in texts written in this language
      • All the characters of the language that will be recognized
      • Text may contain Arabic numerals, Roman numerals, and abbreviations.
  5. Once you have finished creating your new language, select it as the recognition language for your document.

    Finereader listpicture Creating an OCR Language For details, see Document Languages.

By default, user languages are saved in the ABBYY FineReader document folder. To change this folder, select Tools>Options…, click the Advanced tab, and specify a new folder under User languages folder.

Creating an OCR Language