Technical FAQs
Redacting documents is critically important for legal departments and government agencies. By removing sensitive information from a digital file before sharing it publicly, it’s possible to protect private data or classified materials from being exposed.
In the days before digital documents, redaction involved a simple, if crude, process of covering text with a black marker. Since redactions were done by hand, it was easy for mistakes to be made, which could range from using insufficiently dark ink to leaving portions of text exposed. The development of high-powered photo enhancement has rendered this approach all but useless, as even inexpensive image processing technology can distinguish blacked-out text.
With the transition to digital documents, organizations finally have access to true redaction capabilities. Unfortunately, they still tend to make mistakes when it comes to flattened PDFs that could leave redacted context exposed and vulnerable.
What Is a Flattened PDF?
A modern PDF file consists of multiple layers, each of which can contain separate elements. One layer might feature text, another image, and yet another a fillable form. The flattening process removes all interactive elements from form fields and combines all of the document’s elements into a single layer.
Organizations frequently used this process to “lock in” form content to prevent anyone from altering the information after a user completes the forms. It also removes elements like dropdown selections within form fields and can burn in other annotations or markups, making them a permanently visible element of the document.
Flattened PDF Redactions
Unfortunately, simply flattening a PDF is usually not sufficient to securely redact a document. That’s because obscured elements are still present in the document; they’re just not visible when the file is viewed and printed.
Recovering improperly redacted content is actually quite trivial in many cases. Two of the most infamous recent examples include information released during the investigation of political campaign chairman Paul Manafort in 2019 and court documents related to Facebook’s use of personal data in 2017. In both cases, journalists were able to copy redacted text from PDF files and paste it into a text editor to reveal the obscured content.
There are typically two ways that improper redactions occur:
- Covering Text with Boxes: This frequent mistake occurs when people try to treat a digital document like a physical piece of paper. They place annotations over the sensitive content, usually in the form of a black box, and then save a flattened version of the PDF thinking that no one will be able to separate the text from the annotation element. As the Manafort and Facebook cases demonstrate, however, getting around these “redactions” is usually quite easy.
- Changing the Color of Text: Another common redaction error involves altering the color of the sensitive text to match the document background. Changing the text color to white, for instance, might make it invisible to the human eye, but it does nothing to alter the content itself. The text can be made visible again by using the copy/paste trick described above or by altering the background characteristics in another program.
The only way to make these methods viable for true redactions would be to actually print the documents with the content hidden and then scan them back into digital form, where OCR could be used to reconstruct a new file. But even in this case, there’s a chance that a powerful OCR engine might be able to pick up the hidden elements.
Using Proper Redaction Prior to Flattening with PrizmDoc Viewer
In order to redact documents securely, applications need to have access to specialized redaction tools that are capable of actually removing content from the document itself before applying redaction indicators. PrizmDoc Viewer’s redaction API can find and extract key text while also providing single or multiple reasons for the removal.
This not only allows organizations to redact documents quickly, but it also ensures that the redacted information won’t be exposed later because it no longer even exists within the document. More importantly, the outputted document is entirely new, so there is no deleted information to recover.
While most people are familiar with the distinctive black bars that indicate redacted content, even this leaves behind significant context clues that could provide hints of what was removed. Consider, for instance, a document involving multiple parties where the names of conversation participants have been redacted.
The following information:
The length of the redaction, then, would at least indicate when the redaction did not involve one person or the other. There are also many instances involving government documents where the length of the redacted information in classified material might suggest its relevance or importance.
When it comes to GovTech applications that need to remove large portions of information for security reasons, it often helps to perform redaction BEFORE turning a document into a flattened PDF. The PrizmDoc Viewer redaction API can be used to quickly extract text from a document and then redact it as a plain text file.
Unlike a static PDF document, plain text accounts for width variations, so all redactions can be replaced with a standardized <Text Redacted> marker that makes it impossible to know the length of the redacted content. The text could then be converted into a PDF after the redaction process is complete.
Take Control of PDFs with PrizmDoc Viewer
As a fully-featured HTML5 viewer, Accusoft’s PrizmDoc Viewer delivers powerful viewing, annotation, and conversion functionality to your web application. It provides a broad range of redaction capabilities that allow legal, financial, and government organizations to keep their sensitive data secure and protect their customers.
By integrating these complex features into your applications, you can focus your development efforts on building the tools that set your solution apart from the competition while our proven technology powers your customers’ viewing and redaction needs. To learn more about PrizmDoc Viewer’s powerful capabilities, download a free trial and test how it can support and enhance your application.
Barcode Xpress and ImageGear .NET. Barcode Xpress is a leading barcode reading SDK. While it supports a variety of image formats, Barcode Xpress works with ImageGear to support even more obscure image formats. For example, Barcode Xpress does not support reading barcodes on PDFs. Combined with ImageGear, developers can support a myriad of image formats and PDFs. With Barcode Xpress & ImageGear working together, developers can integrate a barcode reader that can detect barcodes on almost any kind of document.
Barcode Xpress accepts images in multiple different object types, such as System.Drawing.Bitmap. Using the method ImGearFileFormats.ExportPageToBitmap we can easily take any image that ImageGear supports and export it to a System.Drawing.Bitmap object that we can then pass to Barcode Xpress. So, only a tiny amount of code is required to recognize barcodes with ImageGear .NET and Barcode Xpress. Below, we’ll show various ways to pass different types of images and documents to Barcode Xpress.
Image:
// Load the image into the page.
ImGearPage imGearPage = ImGearFileFormats.LoadPage(stream, 0);
// Export the image to a bitmap and pass that bitmap to Barcode Xpress
Result[] results = barcodeXpress.reader.Analyze(ImGearFileFormats.ExportPageToBitmap(imGearPage));
PDF:
We need slightly more code for a PDF. First, we specify a page number when calling LoadPage. Second, we must dispose of the ImGearPage object after we’re done with it.
// Load the specified page of the PDF as an ImGearPage object
ImGearPage imGearPDFPage = ImGearFileFormats.LoadPage(stream, pageNumber);
// Export the image to a bitmap and pass that bitmap to Barcode Xpress
Result[] results = barcodeXpress.reader.Analyze(ImGearFileFormats.ExportPageToBitmap(imGearPDFPage));
(imGearPDFPage as IDisposable).Dispose();
Now that we’ve explained the most important part, we’ll show you a simple console app that recognizes barcodes on a PDF using the method above.
The code below assumes you’ve installed an evaluation or development license for both Barcode Xpress and ImageGear .NET.
using System;
using System.IO;
using Accusoft.BarcodeXpressSdk;
using ImageGear.Core;
using ImageGear.Evaluation;
using ImageGear.Formats;
using ImageGear.Formats.PDF;
namespace BXandIGDotNet
{
class Program
{
static int pageNumber = 0;
static string fileName = @"Path/To/Your/PDF..pdf";
static void Main(string[] args)
{
// Initialize evaluation license.
ImGearEvaluationManager.Initialize();
// Initialize common formats.
ImGearCommonFormats.Initialize();
// Add support for PDF and PS files.
ImGearFileFormats.Filters.Insert(0, ImGearPDF.CreatePDFFormat());
ImGearFileFormats.Filters.Insert(0, ImGearPDF.CreatePSFormat());
ImGearPDF.Initialize();
using (FileStream stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
using (BarcodeXpress barcodeXpress = new BarcodeXpress())
{
// Load the specified page of the PDF as an ImGearPage object
ImGearPage imGearPDFPage = ImGearFileFormats.LoadPage(stream, pageNumber);
// Export the image to a bitmap and pass that bitmap to Barcode Xpress
Result[] results = barcodeXpress.reader.Analyze(ImGearFileFormats.ExportPageToBitmap(imGearPDFPage));
(imGearPDFPage as IDisposable).Dispose();
// Print the values of every barcode detected.
for (int i = 0; i < results.Length; i++)
{
Console.WriteLine("#" + i.ToString() + " Value: " + results[i].BarcodeValue);
}
Console.ReadKey();
}
}
}
}
Using Barcode Xpress and ImageGear in Other Languages & Linux
You can also use Barcode Xpress and ImageGear together outside of the .NET framework. Barcode Xpress supports several different programming languages and frameworks including .NET Core, Java, NodeJS, Python, C, and C++. All of which can be used on Linux.
ImageGear for C/C++ also supports Linux. Barcode Xpress Linux, which is a C/C++ library, ships with a sample called “ReadBarcodesIG”, that shows how to integrate Barcode Xpress Linux and ImageGear for C/C++. You can find the sample code after downloading our SDK here! For more information on Barcode Xpress, visit our Developer Resources page on the website. In addition, you can also find more information about ImageGear .NET on its respective Developer Resources page as well.
Over the last few years, codemantra has focused on developing document processing capabilities to enhance its core document management systems. The multifaceted collectionPoint platform leverages the power of machine learning to extract data and integrate with business applications such as LMS solutions, ERP software, and CRM systems. However, in order to maximize collectionPoint’s document flexibility, codemantra needed the right integrations to manage and edit PDF documents. Rather than devoting additional development resources to building a viewing solution in-house, the codemantra team instead conducted a thorough review of multiple third-party integrations to find the ideal match for collectionPoint. Find out why they chose PrizmDoc® for Java, formerly VirtualViewer®.