Technical FAQs

Question

What is the proper way of using affinity tokens in cluster mode where multiple file IDs using multiple affinity tokens need to be combined?

Answer

If you are using PrizmDoc Server in cluster (multi-server) mode, and you are using Content Conversion Services to merge multiple files into one, or whenever multiple file ids using multiple affinity tokens need to be combined; your requests need to use a single affinity token. Because affinity tokens need to go in the header, you might think you are required to include all/both of the files’ affinity tokens in the header.

If you find yourself in this situation, the correct method is to re-use the first affinity token you get for all subsequent resources you create. For example, if you create a work file, you’ll get an affinity token back in the response. That affinity token needs to be set in the Accusoft-Affinity-Token request header of any subsequent resources (work files, content converter, viewing sessions, etc.) that you create later and want to use together.

An example is located here:

https://help.accusoft.com/PrizmDoc/latest/HTML/affinity-tokens-and-cluster-mode.html

The main takeaway here is that the initial request that is made to the server for a workfile will return an affinity token. This very same affinity token must be used in the header Accusoft-Affinity-Token for all subsequent requests in this conversion/stitching process.

The most relevant quote from that page is:

“In cluster mode, the PrizmDoc Server API will automatically generate an affinity token when it receives a POST request for a new ViewingSession, WorkFile, MarkupBurner, RedactionCreator, or ContentConverter resource and return it in the response. Once you have obtained an affinity token, you will need to pass this in with related requests using the Accusoft-Affinity-Token HTTP custom header.”

Here is a separate custom example of stitching two TIFF images together by converting them to a PDF.

First TIFF image

Request with no affinity token:

POST /PCCIS/V1/WorkFile HTTP/1.1
Host: prizmdocservername:18681
Content-Type: application/octet-stream

Response:

{
    "fileId": "I3GRFEfrw_K8fX4VJ7Z1bQ",
    "fileExtension": "tif",
    "affinityToken": "ZSTudgjA42h1CVCj0KkGuYiKn5nEFhmFrvA0AkMxDxc="
}

Second TIFF image

Request:

POST /PCCIS/V1/WorkFile HTTP/1.1
Host: prizmdocservername:18681
Content-Type: application/octet-stream
Accusoft-Affinity-Token: ZSTudgjA42h1CVCj0KkGuYiKn5nEFhmFrvA0AkMxDxc=

Response:

{
    "fileId": "I-CTRdFnaL8FLNQDUawTHw",
    "fileExtension": "tif",
    "affinityToken": "ZSTudgjA42h1CVCj0KkGuYiKn5nEFhmFrvA0AkMxDxc="
}

Content Conversion

Request:

POST /v2/contentConverters HTTP/1.1
Host: prizmdocservername:18681
Content-Type: application/json
Accusoft-Affinity-Token: ZSTudgjA42h1CVCj0KkGuYiKn5nEFhmFrvA0AkMxDxc=

{
    "input": {
        "sources": [
            { 
                "fileId": "I3GRFEfrw_K8fX4VJ7Z1bQ"
            },
            { 
                "fileId": "I-CTRdFnaL8FLNQDUawTHw"
            }
        ],
        "dest": {
            "format": "pdf"
        }
    }
}

Response:

{
    "input": {
        "dest": {
            "format": "pdf",
            "pdfOptions": {
                "forceOneFilePerPage": false
            }
        },
        "sources": [
            {
                "fileId": "I3GRFEfrw_K8fX4VJ7Z1bQ",
                "pages": ""
            },
            {
                "fileId": "I-CTRdFnaL8FLNQDUawTHw",
                "pages": ""
            }
        ]
    },
    "expirationDateTime": "2018-10-03T19:12:52.005Z",
    "processId": "1u6k5Y_l7yRfhWyfL1t4Yw",
    "state": "processing",
    "percentComplete": 0,
    "affinityToken": "ZSTudgjA42h1CVCj0KkGuYiKn5nEFhmFrvA0AkMxDxc="
}

Content Conversion Request:

/v2/contentConverters/{processId}

GET /v2/contentConverters/1u6k5Y_l7yRfhWyfL1t4Yw HTTP/1.1
Host: prizmdocservername:18681
Accusoft-Affinity-Token: ZSTudgjA42h1CVCj0KkGuYiKn5nEFhmFrvA0AkMxDxc=

Content Conversion Complete Response:

{
    "input": {
        "dest": {
            "format": "pdf",
            "pdfOptions": {
                "forceOneFilePerPage": false
            }
        },
        "sources": [
            {
                "fileId": "I3GRFEfrw_K8fX4VJ7Z1bQ",
                "pages": ""
            },
            {
                "fileId": "I-CTRdFnaL8FLNQDUawTHw",
                "pages": ""
            }
        ]
    },
    "expirationDateTime": "2018-10-03T19:12:52.005Z",
    "processId": "1u6k5Y_l7yRfhWyfL1t4Yw",
    "state": "complete",
    "percentComplete": 100,
    "output": {
        "results": [
            {
                "fileId": "tK4UbzryHWFoqOC6JJAjAg",
                "sources": [
                    {
                        "fileId": "I3GRFEfrw_K8fX4VJ7Z1bQ",
                        "pages": "1"
                    },
                    {
                        "fileId": "I-CTRdFnaL8FLNQDUawTHw",
                        "pages": "1"
                    }
                ],
                "pageCount": 2
            }
        ]
    }
}

Download The WorkFile:

/PCCIS/V1/WorkFile/{fileId}

GET /PCCIS/V1/WorkFile/1u6k5Y_l7yRfhWyfL1t4Yw HTTP/1.1
Host: prizmdocservername:18681
Accusoft-Affinity-Token: ZSTudgjA42h1CVCj0KkGuYiKn5nEFhmFrvA0AkMxDxc=
Question

We are in the process of converting multiple Microsoft Office documents to PDF files and noticed a few PowerPoint files are returning the following:

“errorCode”: “CouldNotConvert”.

What could be causing this issue?

Answer

This issue can occur specifically with PowerPoint files when using PrizmDoc’s MSO feature or PrizmDoc Cloud. When a document is marked as final, it becomes non-editable and the MS Office API does not allow the document to be edited and returns “errorCode”: “CouldNotConvert”.

Currently the only work around is to open the PowerPoint file in its native application and remove the Mark as Final flag and save the file.

At the time of writing, there exists a pending feature request for the ability to remove this flag automatically. The feature request can be seen here.

Question

After searching a document, an error icon appears in the search results panel. Clicking on it displays the following error message: “x page(s) cannot be searched.” Why does this occur and how can I find out which specific pages couldn’t be searched?

Answer

When the PrizmDoc Viewer text-service cannot find any text for a given page in the document, it provides an array of all the pages without text in the response from searchTask results.

In short, the document is fine and simply contains pages without text. If you look at the pagesWithoutText array contained within the response data from searchTasks, you’ll see something like this:

[0, 1, 7, 17, 43, 45, 65, 67, 77, 79,…]

The values reported are pages that do not contain any text but instead are either blank or contain an image. This data can then be used to inform the user of how many pages are not searchable.

Question

I am trying to deploy my ImageGear Pro ActiveX project and am receiving an error stating

The module igPDF18a.ocx failed to load

when registering the igPDF18a.ocx component. Why is this occurring, and how can I register the component correctly?

Answer

To Register your igPDF18a.ocx component you will need to run the following command:

regsvr32 igPDF18a.ocx

If you receive an error stating that the component failed to load, then that likely means that regsvr32 is not finding the necessary dependencies for the PDF component.

The first thing you will want to check is that you have the Microsoft Visual C++ 10.0 CRT (x86) installed on the machine. You can download this from Microsoft’s site here:

https://www.microsoft.com/en-us/download/details.aspx?id=5555

The next thing you will want to check for is the DL100*.dll files. These files should be included in the deployment package generated by the deployment packaging wizard if you included the PDF component when generating the dependencies. These files must be in the same folder as the igPDF18a.ocx component in order to register it.

With those dependencies, you should be able to register the PDF component with regsvr32 without issue.

Question

What file types are supported by Accusoft PDF Viewer?

Answer

The viewer currently supports only PDF file formats based on the PDF32000 specification. If you need more wide ranging document support our PrizmDoc Server platform can help!

Question

After searching a document, an error icon appears in the search results panel. Clicking on it displays the following error message: “x page(s) cannot be searched.” Why does this occur and how can I find out which specific pages couldn’t be searched?

Answer

When the PrizmDoc Viewer text-service cannot find any text for a given page in the document, it provides an array of all the pages without text in the response from searchTask results.

In short, the document is fine and simply contains pages without text. If you look at the pagesWithoutText array contained within the response data from searchTasks, you’ll see something like this:

[0, 1, 7, 17, 43, 45, 65, 67, 77, 79,…]

The values reported are pages that do not contain any text but instead are either blank or contain an image. This data can then be used to inform the user of how many pages are not searchable.

Understanding the Value of Third-Party Software Integrations
 

Today’s customers expect more of software applications than ever before. Piecemeal solutions that provide only a few noteworthy features are quickly being overtaken by more comprehensive platforms that deliver an end-to-end experience for users. This has prompted developers to incorporate more capabilities, while also building innovative features that set their solutions apart from the competition. Thanks to third-party software integrations, they’re able to meet both demands.

What is Third-Party Software Integration?

Third-party software integrations typically come in the form of SDKs or APIs that provide applications with specialized capabilities. Rather than building complex features like optical character recognition (OCR), PDF features, or image cleanup from scratch, developers can instead incorporate the necessary features directly into their software via an SDK or use an API call to access capabilities without expanding their application’s footprint.

From a user experience standpoint, third-party software integrations allow developers to build more cohesive software solutions that provide all the essential features a customer may require. Instead of pushing them into a separate application to interact with documents, provide a signature, or fill out a digital form, they can instead deliver an unbroken experience that’s easier to navigate and manage from start to finish.  

4 Key Third-Party Software Benefits

There are a number of important benefits organizations can gain from using third-party software integrations, but four stand out in particular:

1. Reduce Development Costs

When evaluating whether it makes sense to build functionality for an application in-house or buy a third-party software integration, cost is frequently one of the key considerations. There is often a tendency to think that it would be more cost-effective to have developers already working on the project simply build the capabilities they need on their own. After all, there’s no shortage of open-source SDKs and other tools that are available without having to pay licensing or product fees.

In practice, however, this approach usually ends up being more expensive in the long run. That’s because the developers working on the project often lack the experience needed to build those capabilities quickly. A software engineer hired to help build AI software, for instance, probably doesn’t know a lot about file conversion or annotation. While they might be able to find an open-source tool to build those features, they still need to do quite a bit of development work and on-the-job learning to get the new capabilities stood up and thoroughly tested. 

Focusing on these features means they’re not focusing on the more innovative aspects of their application. From a cost standpoint, that means they’re being paid to build something that’s already readily available in the market. When these internal development costs are taken into account, it’s almost always more cost effective to buy ready-to-implement software features built by an experienced third party. As the saying goes, there’s no reason to reinvent the wheel. 

2. Get to Market Faster

Software developers are always working against the clock. With new applications hitting the market faster than ever, there’s tremendous pressure to keep development timelines on track and avoid missing important deadlines. This helps projects stay within their expected budgets and prevents potential competitors from getting to market faster. Any steps that can be taken to accelerate development and potentially shorten the timeline to releasing a product could mean the difference between becoming an industry innovator or being labeled as an also-ran.

Third-party software integrations allow developers to quickly and seamlessly integrate essential capabilities into applications without compromising their project timeline. Rather than building features like forms processing, document annotation, and image conversion from scratch, teams can instead use third-party SDKs and APIs to add proven, reliable, and secure features in a fraction of the time. By keeping projects on or ahead of schedule, they can focus on delivering a better, more robust product that exceeds customer expectations. 

3. Expand Application Features & Functionality

Software development teams typically possess the experience and expertise needed to build the core architecture and innovative features of a new application. In many cases, they’re designing something novel that will provide a point of differentiation in the market. The more time they can spend on refining and expanding those capabilities, the more likely the application is to make an impact and win over customers.

What these developers often lack, however, are the skills needed to implement a variety of other features that will enhance the application’s functionality. Features like document conversion, OCR, PDF support, digital forms, eSignature, and image compression are complex and difficult to build from scratch. By integrating third-party software, developers can leverage proven, feature-rich technology to expand their application’s capabilities. This not only allows them to improve their solution’s versatility but also enhance the overall user experience by eliminating the need for external programs or troublesome plug-ins. 

4. Access Specialized Engineering Support

Incorporating features like PDF support, image conversion, and document redaction into an application poses several challenges. Some of those challenges don’t show up right away, instead, they become evident long after a software product launches. If the developers don’t have a lot of experience with the technology behind those features, minor issues can quickly escalate into serious problems that leave customers unhappy and willing to look elsewhere for alternatives. No organization wants to be caught in a situation where a bug embedded in an open-source tool renders a client’s valuable assets unusable.

By leveraging proven, tested, and secure third-party software integrations, developers gain access to support from experienced engineering teams with deep knowledge of their solutions. In addition to documentation and code samples, they can also speak directly with developers who can provide guidance on how to best integrate features and resolve issues when they emerge. The best integration providers will even work with organizations to customize their solutions to meet specific application needs, which helps create even smoother user experiences and enhances reliability.

Integrating Third-Party Software with Accusoft

For over 30 years, Accusoft has helped organizations add essential features like barcode recognition, file conversion, document assembly, and image compression to their applications through an innovative line of SDKs and APIs. Our document lifecycle technologies are backed by multiple patents and have been incorporated successfully into a wide range of applications. Our dedicated engineers provide ongoing support and work closely with customers to implement their specific use cases, ensuring that their software platform is delivering the best possible experience.

To learn more about integrating third-party software with Accusoft SDKs and APIs, talk to one of our solutions experts today.

Question

How can I tell which server has the cache for a specific document in a clustered PrizmDoc environment?

Answer

When a document is viewed, it creates a Viewing Session ID. That Viewing Session ID has information regarding which server in the cluster is doing the work, however, it is encoded and cannot be read directly.

In order to determine which PrizmDoc cluster server is doing the work for a specific document in a specific viewing session, you can do a text search for the Viewing Session ID in the plb.sep_multi.log on all servers. Only one server in the cluster will be a match for that Viewing Session ID.

Today’s organizations gather information from a variety of sources. Structured forms remain one of the most popular tools for collecting and processing data, and anyone who has filled out such a form recently has likely encountered the familiar bubbles or squares used to indicate some form of information. Whether these marks are used to identify marital status, health conditions, education level, or some other parameter, optical mark recognition plays an important role in streamlining forms processing and data capture.

What is Optical Mark Recognition?

Optical mark recognition (OMR) reads and captures data marked on a special type of document form. In most instances, this form consists of a bubble or a square that is filled in as part of a test or survey. After the form is marked, it can either be read by dedicated OMR software or fed into a physical scanner device that shines a beam of light onto the paper and then detects answers based on how much light is reflected back to an optical sensor. Older OMR scanners detected answers by measuring how much light passed through the paper itself using phototubes on the other side. Since the phototubes were very sensitive, #2 pencils often had to be used when filling out forms to ensure an accurate reading.

Today’s OMR scanners are much more accurate and versatile, capable of reading marks regardless of how they’re filled out (although they struggle if the mark is made with the same color as the printed form). More importantly, OMR software has made it possible to capture data from OMR forms without the need for any special equipment. This is especially helpful for processing forms information that exists in digital format, such as PDF files or JPEG images. 

The History of Optical Mark Recognition

One of the oldest versions of forms processing technology, OMR dates back to the use of punch cards, which were first developed in the late 1800s for use with crude “tabulating” machines. The cards typically provided simple “yes/no” information based on whether or not a hole was punched out. When fed through the tabulating machine, a hole would be registered and counted. This same basic principle would allow more complex machines to perform basic arithmetic in the early 1900s before serving as the foundation for early computer programming by mid-century. Entire computer programs were stored on stacks of punch cards, which would remain in use until well into the 1970s when more powerful machines made them obsolete.

Although OMR operates on the same principle as a punch card, it instead uses scanning technology to detect the presence of a mark made by a pencil or a pen. This form of identification was first popularized by IBM’s electrographic “mark sense” technology in the 1930s and 1940s. The concept itself was first developed by a schoolteacher named Reynold Johnson, who wanted to streamline test grading. He designed a machine that could read pencil marks on a special test paper and then tabulate the marks to generate a final score. After joining IBM in 1934, Johnson spearheaded the development of the Type 805 Test Scoring Machine, which debuted in 1938 and revolutionized test scoring in the education sector. In production until 1963, the 805 could score 800 sheets per hour when run by an experienced operator.

The 805 registered marks by using metal brushes to sense the electrical conductivity of graphite from the pencil lead. While effective, it had limitations in terms of reading speed and flexibility. When Everett Franklin Lindquist, best known as the creator of the ACT, needed a machine that could keep up with Iowa’s widespread adoption of standardized testing in the 1950s, he developed the first true optical mark reader. Patented in 1962, Lindquist’s machine detected marks by measuring how much light passed through a scoring sheet and was capable of scoring 4,000 tests per hour.

Throughout the 1960s, OMR scanning technology continued to improve and spread to a variety of industries looking for ways to rapidly process data. In education, however, the OMR market would soon be dominated by the Scantron Corporation, which was founded in 1972 to market smaller, less expensive scanners to K-12 schools and universities. After placing the scanners in educational institutions, Scantron then sold large quantities of proprietary test sheets that could be used for a variety of testing purposes. Scantron was so successful that their distinctive green and white sheets have become synonymous with OMR scanning for generations of US college students.

The next major innovation in OMR technology arrived in the early 1990s with dedicated OMR software that could replicate the drop-out capabilities of commercial scanners. Part of the reason why scanners used proprietary, pre-printed forms was so they could use colors and watermarks that would not register during scanning for more accurate reading. Thanks to OMR software, it became possible to create templated forms and then remove the form image during the reading process to ensure that only marked information remained.

Take Control of OMR Forms with Accusoft SDKs

Accusoft’s FormFix forms processing SDK features powerful production-level OMR capabilities. It not only detects the presence of check or bubble marks, but can also detect markings in form fields, which is particularly useful for determining whether or not a signature is present on a document. Capable of reading single or multiple marks at 0, 90, 180, and 270 degree orientations, FormFix can also recognize checkboxes and be programmed to accommodate a variety of bubble shapes. Its form drop-out and image cleanup features also help to ensure the highest level of accuracy during OMR reading.

For expanded forms functionality, including optical character recognition (OCR) and intelligent character recognition (ICR), developers can also turn to FormSuite for Structured Forms. Featuring a comprehensive set of forms template creation tools and data capture capabilities, FormSuite can streamline forms processing workflows and significantly reduce the costs and errors associated with manual data entry and extraction.

Find out what flexible OMR functionality can do for your application with a fully-featured trial of the FormSuite SDK. Get started with some functional sample code and explore FormFix’s features to start planning your integration.