Providing the full spectrum of document, content, & imaging solutions
Key Features:

  • Scanning and compression
  • Viewing, annotation, and printing
  • Image editing and color image processing
  • PDF and vector support
SDK Support Offered For:
DLL
Unix
Linux
ActiveX 32-bit
OCR
(Available in DLL 32-bit)
OCR Language Support
  • Includes support for over 100 different languages
  • Support for Asian Languages (Chinese, Korean and Japanese)
  • Recognizes characters from multiple languages within a single image
  • For a complete list of supported languages please see our product documentation
OCR Zone Based Processing
Auto-Zoning (Segmentation):
  • Automatically segment page into individual zones for processing.
  • Located Zones are assigned a type based on expected content: Flow, Table, Graphic.
  • Improves recognition results and performance by removing image areas from page prior to OCR operation.
  • Advanced Table detection improves data result reconstruction.
User Defined Zones:
  • Process an entire image or individual region of the page.
  • Zones can be defined on the fly by a user, loaded from a file, or detected automatically by the engine.
  • Flexible API gives developers the ability to define areas of images to be processed and the type of content located in that defined area.
  • Apply advanced Data Checking zone by zone.
  • Specialized content? Define the appropriate recognition module for your content.
OCR Editions

OCR Language Options
The ImageGear Professional DLL functionality supports two sets of languages, Western and Asian. These language options are licensed separately for development and deployment. The Asian offering has some basic support for western characters, but this support will not utilize any dictionaries to improve results. If you need support for both sets of languages please contact us.

OCR Deployment Options
Language support for the ImageGear Professional DLL functionality is available in two different editions for distribution, Standard and Plus. The primary difference between these two versions is the list of output formats created by the OCR engine.

Standard Edition:
The PDF formatting in the Standard Edition is accomplished using text output reported by the OCR engine and ImageGear's internal PDF engine.

Standard Edition Output Formats:
  • Searchable text PDF files
  • Text documents

Plus Edition:
Formatted output is created by using all of the recognition information (font detail, located image areas, and recognized table structure information) to reconstruct a representation of the original document. The Plus Edition leverages the power of the OCR engine to create the robust formatted output.

Plus Edition Output Formats:

  • Searchable text PDF files
  • Text documents
  • Word
  • Excel
  • HTML
  • and more

See product documentation for full details.

OCR Image Pre-Processing
  • Advanced Image Processing methods are available to improve OCR accuracy.
  • Auto Inversion functionality detects if the image needs to be inverted for highest accuracy.
  • Automatic image orientation detects and adjusts images so they are properly oriented.
  • Deskew methods will detect image misalignment and automatically correct it, improving segmentation and recognition accuracy.
  • Despeckling methods remove minor dots and imperfections in the image capture process.
  • Resolution enhancement can be performed to improve the quality of the low resolution images.
OCR Data Checking
  • A complete Checking Subsystem used to improve recognition accuracy.
  • Advanced spell checking using 17 different language dictionaries is provided. Each dictionary contains between 100,000 to 200,000 entries.
  • Vertical dictionaries improve spell checking and OCR accuracy for Medical and Legal industries.
  • Customize validation to your needs by defining User Dictionaries, with values specific to your needs.
  • Validate results using Regular Expressions.
  • Developers can use event based checking to start validation work flow.
OCR Result Processing
Recognition Details:
  • Each character is returned with an accuracy Confidence Value.
  • Separate word confidence values provide additional accuracy indication.
  • Advanced font information and location information allows ImageGear to create text representations of the original, with a similar layout.
Language Control
  • The ImageGear OCR engine processes all data in a Unicode format. The data output can be formatted for a specific code page.
Multiple Output Format Options
  • Image over PDF
  • Text based PDF
  • Microsoft Office 2007
  • Microsoft Office 97 (Word, Excel & Powerpoint)
  • RTF
  • HTML
  • XML
  • For a complete list of output formats please see the product documentation